Show me your vowels!

L’accent écossais, ou : comment analyser les voyelles quand on a du mal à bien les distinguer à l’oreille nue.

This is a bit of a side-piece to the investigation into the pronunciation of the and a — reduced or unreduced? in which context does which form occur? My previous posts are here and here, Mark Liberman’s principal ones here, here, here and here, and David Beaver chipped in here and here.

Looking into when a speaker says [ði] or [ðə], [ɛj] or [ə], or something in the middle, like [ðɛ] isn’t a hard task. You just have to listen to a recording and write down what is said and how, right? Well, yes, if you can indeed hear what’s going on in someone’s speech.

So I was digging up speakers of not-quite-mainstream English accents, and was stumped by Mark Hunter, a Glaswegian who has a podcast on Scottish music. There were two problems with his clips: First, “The Tartan Podcast” is stunningly good, and I was tempted to just listen to the spectacular rock music#[1]. Second, and more seriously, I couldn’t for the hell of it decide when Mark Hunter’s articles were reduced, and when he was employing unreduced vowels.

This is not a problem of understanding what he is saying. Understandability is very much a subjective thing, and indeed I find Mark Hunter’s speech rather euphonious, and much easier to transcribe than others I’ve listened to; for some reason it’s some Californians who require more concentration from me than others — it’s a question of being used to hearing a particular accent.

No, the difficulty came from me not being able to distinguish clearly between his instances of [ðə] and [ðɛ] and [ðʌ]; there’s even a [ði] here and there, along with other vowels, but just by ear it sounded random and all over the place, even before vowels. I was quite unable to tell by ear which of these vowels would have to count as reduced and which as unreduced.

There’s a more technical way of looking at sounds: to plot the principal frequencies in a diagram. With the canonical orientation of the axes of the plot you get a vowel chart — see for example the collection of vowel charts for a number of European languages on the University of Helsinki site: each vowel phoneme of a language occupies its own area of the plot; those along the top are called “high”, those at the bottom “low”; left is “front” and right is “back”.#[2]

Doing this for the vowels of the in a small (3 min) excerpt of Mark Hunter’s speech, we get this:

[ə] is located. But what about the unreduced articles, which are supposed to occur before words starting with a vowel, sound like [ði] and therefore be located in the upper left-hand corner?

Well, Glaswegian is obviously more complicated than that. The before arts, artist, ultimate and answer has a vowel that’s in the area of [ʌ] or even [ɑ]. Apparently, the before vowels — and some consonants, like /m/ and /w/, gets assimilated to the next vowel-like sound in the following word (vowel-like, because /w/, which sort of sounds like [ʊ], counts as well). A fair number of the vowels are also realized as diphthongs, sliding from one value (around [ə] most of the time) at the start of the vowel to another at the end. And they aren’t any longer than normal for that — pretty short actually, around 50ms.

And that’s pretty hard to hear. Short of making spectrograms of each vowel, I’m not sure how to find unreduced articles in unexpected places. It’s a mess.

As a point of comparison, I wanted to hear (and record) the same words pronounced by speakers of more standard dialects. Since I live in France, finding a native speaker of English would have been a hard task if it hadn’t been for the wonders of Internet Relay Chat. John “DogBoy” Tocher (male, originally from San Francisco, now living in Huston, Texas) and Stephane Miller (female, originally from the Midwest, now living in Australia) were kind enough to record themselves saying the words on the labels in the above chart and sending me the mp3 files.

The method is far from ideal, though, because reading out isolated word combinations is quite quite a different task than spontaneous speech. But still, let’s have a look at John’s vowels:

That’s more like it! We still see the “sliding” in the ultimate, from the [i] of the unreduced the towards the vowel in ultimate, and the effect from words starting with /w/, but at least the artist, the other, the arts and the answer are firmly in [i] (or [ɪ]) territory, where we’d expect to find them.

Strangely, John pronounced the April and the e-mail with a reduced article. This may have been due to the artificial character of the exercise.

Now Stephane’s recording stumped me. I won’t even plot it: she pronounced every single the with a reduced vowel, i.e. [ðə]. This had to be an artifact, right? Still, I went back to her and asked whether she always spoke that way, and she replied that she thought she did, except when talking about “Thee Temple ov Psychick Youth” (see also this Google search) because, as she said, “the spelling is weird”. Be that as it may, this is at least one more item on the list of thee subcultures (and I have yet another one, which I’ll blog later).

Back to Mark Hunter and his Scottish vowels. He must have gotten a few remarks on his accent from his listeners and other podcasters. Which is why he is offering some advice on “how to cope” with his accent. And sound advice it is. Listen to it here (lo-fi mp3). This was in his Tartan Podcast number 15 — and he’s at number 48 now.

Okay, I’m off listening to the Tartan Podcast Sleepy Sunday Show


[1]: If you are into this check out Gum, Electrum, Finniston, Team Salt, Ally Kerr, Miss The Occupier, Conestone… I’d buy a CD by any of these singers and bands in a blink. [2]: This page on the University of Manitoba site has a very nice and gentle introduction to English vowels. I have also been collecting links to phonetics/phonology resources on my wiki — thanks to whoever corrected a spelling error there some months back.


No word too small

Comment les articles de l’anglais tissent des liens entre êtres humains, pourvu qu’ils bloguent [hé, c’est un subjonctif, ça !].

You know, a little over a year ago, I was wondering whether blogging was an activity I should take up. I was hesitant for a while because it seemed you had to be either your own journalist, which I am not, or to spend a considerable amount of time gazing at your own navel.

I was, of course, wrong. Not because there aren’t a lot of navel-gazing blogs out there — and it’s a perfectly fine activity, if you’re into it. No, what I had underestimated was the community aspect of blogging. In addition to local meet-ups and communal efforts organised around a particular blogging platform, it’s the tool itself that makes sure that bloggers whose interests overlap in some way or other will find each other if they are so inclined.

I’ve been quite touched by how easy it is for us to develop social bonds. Leaving comments may develop into exchanging e-mail or hanging out in the same IRC channel. Those who know me in the corporeal world may be aware that I can be painfully shy about things like that. When I dropped off the web for two months earlier this year, my ircquaintances and those involved in the same online projects kept enquiring after me, which helped bring me back.

Now, when Mark Liberman — to whose Language Log posts I’ve sometimes been playing sortof a Greek chorus via email and blog entries here — became interested in the pronunciation of the and a, his post showed up in my aggregator, and I was, as so often, intrigued. So I looked at recordings by speakers I had listened to before and happened upon one by Ed Felten. Ed Felten, in turn, saw my post — either in his referrer stats or via a backlink-and-blog search engine like Technorati (hi Kevin Marks, by the way) — and left a comment. I was just glad that he took my dissection of his speech in good humour, when I realised he had even taken up the small linguistic interest Mark Liberman and I have been taking in his thes and as on Freedom to Tinker, his own blog. Oh, and Mark Liberman, in turn, replied to Ed Felten’s post, where this post should be showing up as a trackback, if all goes right. La boucle est bouclée (“the loop is looped around”; or, etymologically, if not semantically, closer, “the buckle is buckled” ) as they say here.

I can understand Ed Felten’s consternation at being commented on, for once, not for his forceful insights into DRM, copyright, and the way the law and the actions of big media companies shape the pubic debate — he is used to that –, but for his language. And not even for something solid like verbs and nouns, or grammar, or semantics, but for the way he pronounces his articles.

Well, I find the topic quite fascinating. Mark Liberman quite rightly analysed the instances not, as I did at first, as “correcting one’s pronunciation”, but looked strictly at whether the articles precede an utterance that starts with a vowel or a consonant sound (putting aside [j] — as in united or university — and [h] for the moment). And if Ed Felten now finds himself “listening to every speaker [he] hear[s], to see whether they do it too”, I have fallen victim to a similar fixation, hearing “ph.-pr.” (now how does he pronounce that) unreduced the [ði] and a [ɛj] literally#[1] everywhere.

Listen, for example, to this snippet (.wav) from an interview with the copy editor, author and blogger Bill Walsh:

  • wild art is the uh newsroom term for a stand-alone photograph

We have an unreduced the before the disfluency “uh”; and an unreduced a before a consonant, without any pause, pseudo or not.

This fascination rather reminds me of one of the most engaging teachers I had in high school, back in Germany. He was a trainee teacher during the two semesters he took our class for Ancient Greek, and he made all the little errors and exhibited all the insecurities this state brings with itself#[2]. But he did know his stuff, and the way he approached the horrors of the irregular verbs and the labyrinth of Greek adverbs and particles — with love and tender care — somehow got through to us. When he told us of his Master’s thesis on, if I remember correctly, frequencies of a number of particles and elisions in Homer, the bunch of 16-year-olds that we were could only frown in consternation. But his passion was quite unstoppable, and we even learnt something.

(If only how to deduce the translation passage on the exam on the Iliad he gave us — he had dropped too many hints about a rare sense of a particular verb here and an elided particle there, and we were quite capable of searching through the text ourselves. We did tell him, over a beer, when the entire class went out grilling sausages just before the end of the school year and had invited him along. And he had been so proud that eight out of a class of 27 had managed the grade 1 (roughly, an A) on what had supposedly been a tough exam on the hardest text you do in your third year of high-school Ancient Greek…)


[1]: This is the spurious usage of literally, of course, which serves to reinforce a statement. [2]: I sympathise very much. My own two years were not an easy time, to put it mildly; suffice it to say that I didn’t survive in the French public school system. Neither, to my knowledge, did he in the German one.


Thy “thee”s, Ed Felten…

Quelques observations concernant la prononciation, réduite ou pleine, des articles a et the devant consonne dans un échantillon d’anglais américain parlé.

Some of Mark Liberman’s recent Language Log posts were dealing with dealing with reduced vs. unreduced vowels in the pronunciation of the articles a and the. (Reduced: [ə] and [ðə]; unreduced: [ɛɪ] (or [ɛj]) and [ði:]).

In his latest post, he examined a G. W. Bush speech and found that, as other readers had claimed, Bush indeed pronounces a before consonants sometimes with an unreduced vowel (without any indication that this is done for emphasis).

This didn’t surprise me. Indeed, in my — vague — memory I thought I had noticed the same. Without being able to back this up, it has been my impression that this is a particularity of American public speech, for some speakers only. I once listened to a British politician (I forgot who), who did the same, and to me it made him sound more “American”.#[1]

I also seemed to remember that Ed Felten did something similar. I couldn’t find the talk of his I had watched on a video — and it was over an hour long, which is a bit excessive anyway. But there are other audio and video files of his presentations online. I used an 8½ min audio file, which can be downloaded (with a transcript) from Lisa Rein’s site.

The transcript (somewhat corrected, one eggcorn eradicated) with all the occurrences of the and a marked and colour-coded is on a page of its own.

The result is a bit different from Bush’s pronunciation. (The marked articles are the only ones that are pronounced in a surprising manner. All the others follow the standard pattern, unreduced before vowel, reduced before consonant.)

  • There is only one unreduced a before consonant (this was characteristic for Bush):
    • My third example comes from a question that Barbara Sarmonds asked yesterday about electronic voting.
  • Three times, Felten doesn’t reduce the vowel in the before consonant (not counting one occurrence of the United States, where unreduced the is common in American English):
    • I want to talk instead about what the impact of DRM is on the public policy process related to other issues, that is, my argument will be that DRM not only is a public policy issue [in] itself, but has a [significant] negative impact on the public policy debate.
    • […] sometimes people say that the device is an appliance, although that’s also a misnomer, it’s not like any normal appliance you might have in your house; […]
    • So as a result of all of this, DRM and the [uh] things that come with DRM turn technological devices into black boxes.
  • More surprisingly, Felten sometimes starts the with an unreduced vowel before consonant, than catches himself and after a pause and/or some ho-humming, says it again, with a reduced vowel. The first three are in two consecutive sentences, and then there are two in isolated sentences later on:
    • But all of these things really mean that the technology is supposed to be a black box, you’re not supposed to be able to look inside of it. And this black box effect tends to grow over the scope of the system for example if you’re talking about a computer system you might say well only the part that deals with the media has to be a black box the boundaries of that black box tend to grow because there’s concern that the content will be grabbed off of the video card or the audio card that it would be grabbed off of the disk, that it will be grabbed as it goes across the system’s IO bus and so on.
    • And possibly, the black box nature of the systems is backed by laws like the DMCA that tend to ban analysis or tinkering or discussion related to the device.
    • The big problem, though, is the risk of fraud.
  • Once, he does exactly the same (unreduced, pause, reduced) with a:
    • At the end of the election, it spits up a count of how many votes were cast for each candidate, or at least we hope it does that.
  • Finally, in the case of one the before a noun beginning with a consonant, he does the reverse: starting out with reduced [ðə], and correcting himself to [ði:]. This happens when he mentions a crucial example:
    • The first one was mentioned by Dave Farber this morning, the Total Information Awareness Program.

What does this mean? No idea, I’m just playing around. Clearly, unreduced vowels take a little more time, and command more attention, than reduced ones. Maybe Felten’s little back-and-forth game gave him the, as we say in German, Denkpause (a think-pause, a pause for thinking) he needed at the beginning of his talk, where, as the full transcript shows, most of these self-corrections happened.

Note that unlike Bush’s example, this is not a fully scripted political speech, but an academic talk.


[1]: Of course, these extraneous unreduced vowels in articles before consonants could simply be a sporadic occurrence among speakers of all varieties of English, and my link to American English specifically completely spurious.


Transcribing another unknown language

Un autre quiz sur Language Log. On les adore.

Mark Liberman at Language Log has posted a second transcribe-and-guess-the-language quiz. I believe most readers of this blog interested in this sort of question, so you probably know this already. As one of those who got the first one right, I couldn’t resist of course. (More seriously, though, it’s an excellent exercise.) I have followed my […]

 read the post »

More on dealing with unknown languages

Une autre livraison concernant le mystère des langues mystères.

First of all, I was right, and so was caelestis at (or le?) sauvage noble: the mystery language is Romansh. It is interesting to look at the differences between our approaches. Caelestis writes in his comment section: For the record, I should state that all I went on was the MP3, the exercise having […]

 read the post »

Transcribing an unknown language

Ma réponse à un défi de déviner une langue à partir d’un enregistrement, et de le transcrire en phonétique.

This is a reply to Mark Liberman’s challenge to a) guess the language on a recording and b) transcribe it. I’ve never transcribed anything but English, and this more often into phonemes than phonetically (ie, writing down actual heard sounds, which is much more difficult). Even though I’m not a card-carrying linguist (but seriously thinking […]

 read the post »