MORE SYSTEM IMPROVEMENTS – this time aiming at Catalan and P-CelticMay 14, 2014
I have made some improvements to Wordlink and Multidict (and hence Clilstore) for Catalan and for the P-celtic languages, Breton, Cornish and Welsh.
I was trying to make some sense of a Breton website the other day by viewing it with Wordlink and wasn’t getting on very well – partly because I know hardly any Breton(!), partly because the online dictionaries are not very good, but partly because of a difficulty Wordlink had with Breton.
In Breton “c’h” is regarded as a single “letter”, but Wordlink was breaking Breton words such as “boulc’het” into two non-words “boulc” and “het”. It was taking the non-alphabetic character “’” to be a word-break, (the same as it does in English “words” such as “queen’s” and “isn’t”). I managed to put that right and get it to treat the sequence “c’h” in Breton (or the ascii version “c’h”) as being part of a word.
Then I remembered that Catalan has exactly the same problem with “l•l” so I put that right too. I noticed that the “l•l” was also upsetting the Hunspell dictionary headword suggestions feature in Multidict, so I have written a new general “word recognition” feature for Wordlink and Multidict which could be used for other languages too. It could be used for example to treat hyphens as part of a word. Or it could be used to get Wordlink to treat “queen’s” and “isn’t” in English as single words – but I don’t think that would be a good thing on the whole.
Anyway, it is very good to have Wordlink now treating “l•l” properly in Catalan, because Catalan is one of the POOLS-3 languages and the Catalan team will soon be adding videos and transcripts to Clilstore.
Another problem with Breton, and also the other two P-celtic languages, Cornish and Welsh, is that they have a very complicated system of word-initial mutations. The Breton word “penn” can change to “benn” or “fenn” for grammatical reasons, “tad” can change “dad” or “zad”, “kalon” to “galon” or “c’halon”. This causes big problems for learners, and means that they find it very difficult to look up words in dictionaries. (Scottish and Irish Gaelic also have initial mutation, but not so complicated, and the dictionary headword is retained in the spelling, so dictionary lookup is not a problem for learners.)
This sounded like an ideal job for the new dictionary headword suggestions feature in Multidict, so I have now programmed into it “demutation” algorithms for Breton, Cornish and Welsh. They are not perfect, I know, and I have made them a “non-priority” feature in Multidict’s set of rules – i.e. the suggestions they come up with are listed at the end, to be tried only if the surface wordform lookup fails (whereas the demutation of Scottish and Irish Gaelic words is so easily recognised and distinctive that it is a priority feature in Multidict). They should be a big help to learners, though.