The Belgian Google’s spelling problems
google.be
thinks that incorrecta is a French word and that incorrectes is incorrect, when it isn’t.
And a tip to get Google to stop extending the search to inflected forms (plurals, past tenses etc.) of the words you enter: put them between quotation marks.
Via mes requêtes Google :
On voit que google.be
essaie de « corriger » incorrectes en incorrecta alors qu’incorrectes est bien correct. C’est d’ailleurs surprenant vu qu’il accepte phrase, langue et francaise. En dépit du fait que l’utilisateur ait choisi l’interface en anglais, le moteur de recherche doit donc bien penser qu’il s’agit de termes français.
La capture illustre aussi le fait que Google effectue désormais des remplacements morphologiques dans des langues autres qu’en anglais : bien que l’internaute ait entré phrase, le moteur trouve des pages contenant le mot phrases au pluriel. (Si on veut éviter cela, il suffit de mettre le terme en question entre doubles guillemets droits : "phrase"
ne trouvera pas de pages avec le mot au pluriel.) Il reconnait aussi automatiquement francaise comme variante de française.
[Le vérificateur d’orthographe — le mien, cette fois — ne connaissait pas : l’internaute ; et incorrecta, bien sûr.]
Related posts: Mon pin's est greenz, Cinderella, vair or verre?, Moins dérangeants politiquement pingouins, Clouds of words, Je blogue et j'ai les boules, Finex ! Pooo !, Les poteaux roses, c'est auripilant
Technorati (tags): French, Google, language, langue, langue française, moteurs de recherche, orthographe, search engines
The VP of engineering at Google gave a talk at UCF recently, and highlighted how google translates using their huge database of text. Spelling corrections and translations are context-based and not dictionary/lexicon based. It appears that google weighs the number of times word X has been used in the presence of other words in the search phrase in comparison to the number of times word Y has been used - and so on, to deliver the correction suggestions. They do this to be able to suggest corrections to proper nouns. “Kofi Annan” was an example he used, Britney Spears was another…
That Google should rely heavily on context doesn’t surprise me. I did wonder whether they bother to identify the language of the query, and if so, how it matters. On
google.be
most queries are bound to be either in French or in Flemish.It’d be interesting to know more about these techniques. For example, there certainly strings that are words in one language (say, Dutch, French, Swedish or Danish), but more frequent, globally, as misspellings of English words. Google doesn’t systematically suggest to correct them, or not on all platforms. Take “littérature” (French, perfectly ok for google if spelled with e instead of é), “literature” and German “Literatur”. Is the last one a typo by an English speaker or a legitimate query from a German speaker?
(The spell checker I use in FF, btw., wanted to replace “littérature” with “littérateur”, using the En/UK dictionary.)
Well, I suppose their techniques affords them insight into the “Correct” spelling regardless of the language the word is in. Once the connection is made between what was typed and what might have been intended, google tries to match it to a language. So if you search for “littérature” (with an e) then google asks you if you want to limit the search to English only, while listing French results.
The way I made sense of it, it is a sort of popularity contest for phrases/words. If enough webpages spell a word wrong across a domain, then that automatically becomes right.
Je n’ai pas ce cas (fr, be ou com) ça dépend de beaucoup de variables.