Still on multilingual blogging, I have recently met (thanks to IRC) Patrick and Luke, who are both interested in the linguistic and multilingual aspect of blogging and have good ideas to contribute.

Luke, in particular, has an proposal for “distributed translation” of blog content by bloggers who typically blog about related topics in a different language and are sufficiently competent translators. For Creative Commons licensed posts there is actually nothing else to implement or code — people could start right away.

As Suw, who blogs in English and Welsh, explains, even in the absence of blogging platforms that specifically support multilingual blogging, Technorati tags or a similar mechanism could serve to “tie together” the different versions of a post, plus related post in whichever language they are written.

Edit: I added the hreflang attribute to the links, like Kevin suggested. Thanks! The indicators of the link languages that appear in pale violet after each link are visible in all browsers that are CSS2 compliant, ie the vast majority of modern browsers, Internet Explorer (any version) being the exception.

The lack of recent posts on this blog is due me going through a rather deep low at the moment. I’m exhausted (from doing nothing in particular), my concentration is spotty, and so is my short-term memory. So I read half-paragraph by half-paragraph and write one sentence fragment a time.

Even though one characteristic point of these blue phases is that nothing, by itself, will provide a comprehensive cure, this is not a reason to abstain from small attempts to do something pleasant. Which means, in the easiest case, turning to chocolate and associated products.

So this Libération article comes at the right time. We learn from it not only that Ferrero’s annual production is large enough that you could cover the equator with all the Nutella jars lined up, but also that Nutella is … left-wing. In Italy, I mean. But apparently the neo-fascists are trying to even things out a bit, and the Forza Italia guys (Berlusconi’s crowd, in case you didn’t know) are holding “Nutella parties”.

Now, I’m in France, and haven’t noticed a particular political preference neither for the icon, nor in the actual consumption. But we have other problems here, about the gender. The grammatical gender, obviously.

In French, brand names usually get their gender from the underlying product type, even if they are not typically used as modifiers (which would have to agree with the noun they modify). Thus, car brands are all feminine (la voiture)#[1]: une Ford, une Porsche, and even une Mondeo (despite the -o that points to a masculine name) and une Golf (although the noun golf, the sport, is masculine). If this method isn’t applicable, a masculine default applies.

For Nutella, there are two reasons to expect it to be feminine: the suffix -ella, which every speaker of French would expect to create a feminine-gendered diminutive, and the underlying product la pâte (à tartiner), or la crème, maybe.

Yet, Google is very clear on this: 27,000 hits for “[le | du | au] nutella” vs only 865 for the feminine form.

I wasn’t totally convinced and conducted some field research while nipping out for a bottle of milk at the Moroccan grocery store that is open on Sunday evening, with a stop at the café next door#[2]. The result wasn’t quite as clearly in favour of le Nutella, but the preference is there. Strangely enough, if Nutella is used with a partitive determiner/preposition plus definite article to denote an unspecified quantity of a specific instance of Nutella, as in « Tu veux encore du / de la Nutella ? » (”Do you want some more Nutella?”), some speakers who otherwise opted for the masculine gender preferred the feminine form.

Let’s explore this a little further. In German, you have to choose between three genders. Neutral (or masculine) default could be assumed in the absence of other criteria, but German is divided into many dialects that often have their own rules about genders and cases of inanimate nouns.

Ferrero is (sortof) helpful by saying that since “nutella [lowercase in German, it seems, like on the labels] is a fantasy name that is registered as a brand name, it is used without article in general” and that everyone can decide for themselves which article to use in case one is needed.

Personally, I say die Nutella (can’t really bring myself to write it in lowercase letters right now). It is so obviously an Anglo-Italian hybrid, and for me the suffix should determin the gender. Car brands, by the way, are masculine in German (even those that are derived from Spanish female first names).

[1]: Automobile, which was an adjective before becoming a noun, used to be admissible in the masculine (due to un véhicule automobile). Nowadays, the only feminine form is considered correct. [2]: Note to researchers: don’t ask questions in a Parisian café right the moment when Monaco scores against Paris St. Germain.

Locali(s|z)ation and internationali(s|z)ation

Quelques remarques au sujet de la localisation ou internationalisation linguistique, des outils et leurs failles, et du cas particulier des blogues.

  • 2005-01-22
  • Comments Off

As explains, localization (the US spelling seems to be dominant across varieties of English) is

[t]he process of adapting text and cultural content to specific target audiences in specific locations. The process of localization is much broader than just the linguistic process of translation. Cultural, content and technical issues must also be taken into account.

Since trying to give a hand making the WordPress blogging software useable for multilingual blogs, I have been running into the difficulties of this process.

Internationalising a blog is not the same as localizing software, though. I have written more on this on the palimpsest wiki.

A commonly used tool in the world of free software is gettext. Its approach — extract strings in the language the software was originally written for and substitute text in the target language(s) — sounds reasonable and straightforward. Until you try to use it, that is. Via LaugingMeme, I found a detailled account on the shorcomings of gettext by Sean M. Burke and Jordan Lachler: a “localization horror story” about the simple task of translating the program alerts “I scanned N directories” and “Your query matched N files in M directories” into Arabic, Italian, Chinese and Russian. Sounds easy? Not so fast …

The Chinese guy replies with the one phrase that [all variations of the second sentence] translate to in Chinese, and that phrase has two “%g”s in it, as it should — but there’s a problem. He translates it word-for-word back: “In %g directories contains %g files match your query.” The %g slots are in an order reverse to what they are in English. You wonder how you’ll get gettext to handle that.

But you put it aside for the moment, and optimistically hope that the other translators won’t have this problem, and that their languages will be better behaved — i.e., that they will be just like English.

But the Arabic translator is the next to write back. First off, your code for “I scanned %g directory.” or “I scanned %g directories.” assumes there’s only singular or plural. But, to use linguistic jargon again, Arabic has grammatical number, like English (but unlike Chinese), but it’s a three-term category: singular, dual, and plural. In other words, the way you say “directory” depends on whether there’s one directory, or two of them, or more than two of them. Your test of ($directory == 1) no longer does the job. And it means that where English’s grammatical category of number necessitates only the two permutations of the first sentence based on “directory [singular]” and “directories [plural]”, Arabic has three — and, worse, in the second sentence (”Your query matched %g file in %g directory.”), where English has four, Arabic has nine. You sense an unwelcome, exponential trend taking shape.

Your Italian translator emails you back and says that “I searched 0 directories” (a possible English output of your program) is stilted, and if you think that’s fine English, that’s your problem, but that just will not do in the language of Dante. He insists that where $directory_count is 0, your program should produce the Italian text for “I didn’t scan any directories.”. And ditto for “I didn’t match any files in any directories”, although he says the last part about “in any directories” should probably just be left off. […]

Then your Russian translator calls on the phone, to personally tell you the bad news about how really unpleasant your life is about to become:

Russian, like German or Latin, is an inflectional language; that is, nouns and adjectives have to take endings that depend on their case (i.e., nominative, accusative, genitive, etc…) — which is roughly a matter of what role they have in syntax of the sentence — as well as on the grammatical gender (i.e., masculine, feminine, neuter) and number (i.e., singular or plural) of the noun, as well as on the declension class of the noun. But unlike with most other inflected languages, putting a number-phrase (like “ten” or “forty-three”, or their Arabic numeral equivalents) in front of noun in Russian can change the case and number that noun is, and therefore the endings you have to put on it.

He elaborates: In “I scanned %g directories”, you’d expect “directories” to be in the accusative case (since it is the direct object in the sentence) and the plural number, except where $directory_count is 1, then you’d expect the singular, of course. Just like Latin or German. But! Where $directory_count %10 is 1 (”%” for modulo, remember), assuming $directory_count is an integer, and except where $directory_count %100 is 11, “directories” is forced to become grammatically singular, which means it gets the ending for the accusative singular… You begin to visualize the code it’d take to test for the problem so far, and still work for Chinese and Arabic and Italian, and how many gettext items that’d take, but he keeps going… But where $directory_count %10 is 2, 3, or 4 (except where $directory_count %100 is 12, 13, or 14), the word for “directories” is forced to be genitive singular — which means another ending…

This said, for translations of single words, or text without variables, esp. in a short script, gettext is perfectly adequate. But there’s another problem: blogs, while technically software (PHP scripts, in our case) face different problems from desktop utilities or the like. The text to be translated needs to be user-editable. Every blog is different, and bloggers will want the text — any bit of text — to appear just like they prefer it. Which, for the moment, is quite difficult to achieve, on a multilingual blog.

Which reminds me once again how regrettable it is that written communication better take place in one language at a time. Spoken communication is much more flexible in this regard. (One exception are discussions on IRC or other public chat channels: I’ve often found it useful to carry on two separate conversations with the same interlocutors in two different languages; it’s easier to keep the conversations apart this way.)

With Morgan Doocy — who, unlike me, actually knows how to code in PHP — I am working on a plugin to make WordPress comprehensively suitable for multilingual blogging. Of course, we have a lot of ideas what we expect from a mulitlingual blogging tool (you may have noticed that this blog is already bilingual-and-a-half). […]

 read the post »

Garden paths

Les chemins du jardin qui mènent dans la brousse. Syntaxiquement parlant.

Microsoft debuts a malicious software removal tool today. (link) — Just glad I don’t have any Microsoft software on my computer any more. I might inadvertently install the malicious tool. Powell Surveys Devastated Area — A headline quoted from memory, from, I think, USA Today (which would have been USA Yesterday, or rather USA The-Previous-Day), which […]

 read the post »

A blogger on the radio

Un blogueur (britannique) à la radio (écossaise).

  • 2005-01-11
  • Comments Off

Tom Reynolds, who blogs at Random Acts of Reality about being an emergency medical technician in east London, has been on BBC Scotland talk radio (see also this post). The Real Media file of the segment he was on is here, for a few more days. The programme talks about potential problems that might arise when […]

 read the post »
  • 2005-01-11
  • Comments Off

After letting Guillermito’s case settle down in my mind (and on the web) for a few days, some final notes (to complete what I wrote in parts one and two). First, I got to meet Veuve Tarquine, the charming and knowledgeable law blogger, at Paris Carnet. She explained to me that the problem with Tegam’s (the […]

 read the post »

The French noun rencontre signifies a chance meeting, an intersection of one’s path with that of someone else. Sometimes, the paths run in parallel for a while, often they diverge again quickly. The English quasi-equivalent is “encounter”, but as always, the connotations are just a little different. This post is about three such encounters. All […]

 read the post »

The new site design just went live. Some fine-tuning will follow. The new theme#[1] has been tested in standards-compliant browsers. Corrections to take Internet Explorer’s defective CSS interpretation into account will follow when I have time to log into Windows and fiddle with the style. It shouldn’t look too bad even now. [1]: yes, a […]

 read the post »

Anagram poetry has taken hold. Here are three attempts, and several more are in the works. Each poem is dedicated to an online or offline friend. Should you recognize yourself, you can keep yours. ah bland honey jar ann had herbal joy rehab only had jan oh jan, bleary hand! handy banjo haler heal nonhardy jab posh hebetation hip banshee […]

 read the post »