Transcribing another unknown language

Un autre quiz sur Language Log. On les adore.

Mark Liberman at Language Log has posted a second transcribe-and-guess-the-language quiz. I believe most readers of this blog interested in this sort of question, so you probably know this already. As one of those who got the first one right, I couldn’t resist of course. (More seriously, though, it’s an excellent exercise.)

I have followed my new fondness of wikis and used first my local one, then my new (and still quite empty) online wiki to work progressively on my solution. To be fair, my brother and I conferred a bit on Jabber: he was surprisingly enthusiastic, and provided support and his remarks.

Now, the wiki page (which has a timestamp) won’t be changed (spelling errors and all) until Mark has come out with the solution.

Once the work was as complete as it’s likely to get, I did have a look to find out whether caelestis at sauvage noble might have taken up the challenge, too. It turns out he has — I swear, I didn’t change a letter of my wiki post after looking, and won’t.

Caelestis’ transcription and mine are reasonably similar. He transcribes some sounds as a where I opt for an (open) o, and he might be right. Especially if the language is Somali or a close relative. His morphological analysis is more sophisticated than mine. He rejects Somali, however, on grounds of prosody. I’m not so sure about that. Somali is supposed to have tonal accent (caelestis calls it “pitch accent” for the samples), which is a point in favour. Still, I’m far from convinced myself.

Edit: My certainty level, which has been rising over the day after listening to the Somali recordings on this site (via caelestis) , has just taken another hike. Thanks to my dear brother, who doesn’t have a web presence yet (even though his sister has been nagging). Mark Liberman’s tantalizing “European Event” immediately made me think of the Dutch filmmaker Theo van Gogh’s murder — but I failed to see the in reality obvious connection to Somalia: His film Submission about violence against women, which is said to contain a harsh criticism of Islam, was written by Ayaan Hirsi Ali, a Somali refugee and now a liberal (ie conservative) member of parliament in the Netherlands. Letters containing death threats against her were pinned to his body with a knife.

Update: The mystery language was indeed Somali. I have updated the documentation of my efforts on the wiki — conclusions and general remarks added.


More on dealing with unknown languages

Une autre livraison concernant le mystère des langues mystères.

First of all, I was right, and so was caelestis at (or le?) sauvage noble: the mystery language is Romansh. It is interesting to look at the differences between our approaches. Caelestis writes in his comment section:

For the record, I should state that all I went on was the MP3, the exercise having been billed as a quiz. I just played it looped and trascribed and transcribed.

Well, that’s what I did, too, except that I googled first and only started transcribing after settling on Romansh. But then he goes on:

I also decided not to transcribe too narrowly, opting instead to approximate orthography, on the principle that hints about historical spelling might also betray something of the text’s language’s history.

I, on the other hand, took it as an exercise in using IPA, playing around with the values of the symbols and trying to get them right. Of course, I was also keen on actually understanding the recording, so I did think about the sense (adjective + noun combinations were particularly easy to identify, pi might mean more (plus in French); I neglected to look for possible occurrences of the conjugated forms of be, which would have been logical to do).

Focussing on the sound and the intricacies of IPA led to a few difficulties. For example, the speaker says religiun(s) several times, but sometimes the stressed syllable comes out as [dʒun] (with a voiced postalveolar fricative), sometimes in my ears more like [djun] (palatal approximant) or, between the two, [dʝun] (palatal fricative). Should I have made a distinction between the occurrences? I decided against a transcription that narrow.

One of those light bulbs that go on in ones mind came after I wrote the last entry: Grischun is the canton of Graubünden, Grisons in French. (I’m really bad at French geographic names for places outside France that I already know the German and/or English version of. Imagine the consternation of the French friend who once told me he liked Aix-la-Chapelle, when I replied I had no idea of what place he was referring to.)

As far as understanding the recording is concerned, caelestis’ has, in my opinion, the edge. I understood two more snippets from reading his. Other meanings came to me later. For example, the passage un model che corresponde miglier a lur habilitats (in an approximation of what the spelling might be) obviously means a model that better corresponds to their capacities. So the sentences before that should contain the antecedent of their, presumably referring to pupils. But where and what is it? It took me a thorough look through this page in Romansh dealing with school questions (actually, with the very issue of the place of the language in education) to find out that it is scolasts (for boys) or scolastas (for girls). Which in turn points to a tentative acuire as the verb of the third sentence. For the rest of it, caelestis is quite helpful: some aspects of the school system “come into question” (similar to the German idiomatic expression which could be calqued as put something into question). And so forth.

The bits on which we don’t agree (l/m/n? o/u?) don’t seem very important, and may be very hard to get “right” anyway: the language is called Rumantsch by its speakers and Romansh in English, most of the time, anyway. The presence or absence of the t illustrates a similar difficulty: saying [nʃ] is hard without an intruding stop/plosive ([t]). Is this stop part of “how the language is pronounced” or just of “how a particular speaker sometimes pronounces it”. It may be one or the other, depending on the particular language. But since it was unknown in the first place …

The transcriptions that the unique and in many ways excellent speech accent archive proposes aren’t totally uncontroversial either, however instructive they may be. There’s rarely a distinction between clear and dark l, and most pronunciations of the English w are transcribed as [w], even though there are quite a few instances of [v] and [β] in their samples.

The standard variety Rumantsch Grischun appears to be the the equivalent of Hochdeutsch in German: a normalized compromise agreed upon for teaching purposes, to have a unified written language and, in the case of Romansh, to keep the language alive and legitimize it, but one that hardly anyone speaks in its pure form. It is, with tudestg, franzos and talian, one of the four “national languages” in Switzerland. (Calling French franzos makes a German speaker smile with amused embarrassment since this sounds vaguely insulting in German.) In German, it is prefectly acceptable if the phonetic features of one’s region of origin’s dialect shine through even in the most formal speech situations. Romansh apparently has five distinct dialects, and I agree with the Debian geeks (er, and thanks for providing the OS that runs my computer!), and Mark Liberman’s justification, that the speaker’s dialect is Surmiran. Any French reader who has made it to this point of this post will easily recognize the text used in the dialect samples. Yes, I know you prefer the version that every French school kid learns by heart.

I have not made any headway to speak of with Welsh. Those conjugation tables of bod need learning by heart, and as long as I’m hampered by a very vague understanding of the pronunciation, I hesitate. For not too long, hopefully.

Edit: I originally was wrong about the official status of Romansh in Switzerland. Corrected now. Thanks, Steph!


Transcribing an unknown language

Ma réponse à un défi de déviner une langue à partir d’un enregistrement, et de le transcrire en phonétique.

This is a reply to Mark Liberman’s challenge to a) guess the language on a recording and b) transcribe it. I’ve never transcribed anything but English, and this more often into phonemes than phonetically (ie, writing down actual heard sounds, which is much more difficult). Even though I’m not a card-carrying linguist (but seriously thinking about becoming one yet, despite the obvious practical and the less obvious psychological obstacles), I had a go.

Guessing the language turned out much easier than the rest. I may be wrong, of course, but I believe it’s Romansh or something very close to it. I’d have been totally lost had it been anything but a Romance or maybe Germanic language. This, though, is one I sometimes understand Latin in. After ruling out those among the bigger Romance languages I don’t immediately recognize (to wit, Romanian, Occitan and Portuguese), and Esperanto (just to be on the safe side) I browsed Wikipedia until I settled on Romansh. Or maybe Ladin? My opinion is influenced by my certainty to have heard the language before. This Romansch word list shows that Romansh has nouns/words ending in -un, -al and other syllables ending with a consonant, just like on the recording. And then there’s the word [griˈʒun]: its speakers call the language Rumantsch Grischun.

As for the transcription, there were many small decisions to take, and at one point I just stopped trying to be totally consistent (or even deciding in the first place) how open a particular a or e sound was or where it was articulated. Then of course there’s the question of where to make, for lack of a better term, gaps. Between words? But if you don’t know where the word boundaries are, you are lost. Whenever there is a pause? There aren’t enough of them! So I guessed, based on what I understood and guessed, and when I didn’t have anything to base myself on, I put dots between syllables and took the chance of letting words run into each other. Especially since what I take for articles is pronounced without separation from the head nouns, so they run into each other, too. Okay, here is the fruit of my labours:

la ˈʃkɔ:lɐ. ʃatṛ ˈtseintṛ ˈdəla səˈkunda part dil magaˈtsi:n. pi: ˈdjeʃtɐ pi: ɪndividuˈal. ɪn ʃkɔˈlar.sdɐ.ˈgɥi.rə ˈdβesṇ βiˈni.e.la ˈkɥi.dət.ni of moˈdel da ˈʃkɔ:lɐ. ən modˈel kə ˈkoreˌʃpondə ˈmiljṛ ɐ ˈlu:r habiliˈta:ts. pi ˌkontsilˈjant ɐ pi: tolɛˈrant. il plɐn ˌdinʃtrukˈtsjun dəˌreliˈdʒun pa.las ˈʃkɔ:lɐs el griˈʒun ˈle:vɐ a.vi.ʒiˈnal.ɐs du.as dɛˌreliˈdʒuns. da.pi ˈvargɐ i.ˈnon a.ɪl ˈplandɐ ton en ˈdruket. a ˈkɔ:vaɪ ˈvinə ˈvon.ku.la ˌinʃtrukˈtsjun də ˌreliˈdʒun ˈkɥi.ke.nus vein ˈe.ru ju: səˈvɐɪstlud stəˈpla: ˈsventṛ ˈlaura: ˈelɐ ˈbursa.

As for what it talks about, it’s the second part of a (radio?) news magazine, which deals with school issues: (changes in?) didactics to better adapt to the pupils’ abilities, and something about religious education (to be focussed on?) promoting tolerance. I have some ideas about the last sentence (the bit about the laura and the bursa), but all in all don’t get it.