The Belgian Google’s spelling problems

google.be thinks that incorrecta is a French word and that incorrectes is incorrect, when it isn’t.

And a tip to get Google to stop extending the search to inflected forms (plurals, past tenses etc.) of the words you enter: put them between quotation marks.

Via mes requêtes Google :

google.be correction d'orthographe : phrase incorrectes de la langue francaise - phrase incorrecta de la langue francaise

On voit que google.be essaie de « corriger » incorrectes en incorrecta alors qu’incorrectes est bien correct. C’est d’ailleurs surprenant vu qu’il accepte phrase, langue et francaise. En dépit du fait que l’utilisateur ait choisi l’interface en anglais, le moteur de recherche doit donc bien penser qu’il s’agit de termes français.

La capture illustre aussi le fait que Google effectue désormais des remplacements morphologiques dans des langues autres qu’en anglais : bien que l’internaute ait entré phrase, le moteur trouve des pages contenant le mot phrases au pluriel. (Si on veut éviter cela, il suffit de mettre le terme en question entre doubles guillemets droits : "phrase" ne trouvera pas de pages avec le mot au pluriel.) Il reconnait aussi automatiquement francaise comme variante de française.

[Le vérificateur d’orthographe — le mien, cette fois — ne connaissait pas : l’internaute ; et incorrecta, bien sûr.]


4 comment(s) for 'Google Belge : problème d’orthographe'

  1. (Comment, 2006-02-18 10:35 )
    #1Carthik

    The VP of engineering at Google gave a talk at UCF recently, and highlighted how google translates using their huge database of text. Spelling corrections and translations are context-based and not dictionary/lexicon based. It appears that google weighs the number of times word X has been used in the presence of other words in the search phrase in comparison to the number of times word Y has been used - and so on, to deliver the correction suggestions. They do this to be able to suggest corrections to proper nouns. “Kofi Annan” was an example he used, Britney Spears was another…

  2. (Comment, 2006-02-18 17:18 )
    #2chris

    That Google should rely heavily on context doesn’t surprise me. I did wonder whether they bother to identify the language of the query, and if so, how it matters. On google.be most queries are bound to be either in French or in Flemish.

    It’d be interesting to know more about these techniques. For example, there certainly strings that are words in one language (say, Dutch, French, Swedish or Danish), but more frequent, globally, as misspellings of English words. Google doesn’t systematically suggest to correct them, or not on all platforms. Take “littérature” (French, perfectly ok for google if spelled with e instead of é), “literature” and German “Literatur”. Is the last one a typo by an English speaker or a legitimate query from a German speaker?

    (The spell checker I use in FF, btw., wanted to replace “littérature” with “littérateur”, using the En/UK dictionary.)

  3. (Comment, 2006-02-20 09:37 )
    #3Carthik

    Well, I suppose their techniques affords them insight into the “Correct” spelling regardless of the language the word is in. Once the connection is made between what was typed and what might have been intended, google tries to match it to a language. So if you search for “littérature” (with an e) then google asks you if you want to limit the search to English only, while listing French results.

    The way I made sense of it, it is a sort of popularity contest for phrases/words. If enough webpages spell a word wrong across a domain, then that automatically becomes right.

  4. (Comment, 2006-03-03 01:15 )
    #4¥€$

    Je n’ai pas ce cas (fr, be ou com) ça dépend de beaucoup de variables.