Archive for the ‘Uncategorized’ Category

h1

MORE SYSTEM IMPROVEMENTS – this time aiming at Catalan and P-Celtic

May 14, 2014

by Caoimhín
(Skye)

???????????????????????????????

I have made some improvements to Wordlink and Multidict (and hence Clilstore) for Catalan and for the P-celtic languages, Breton, Cornish and Welsh.
I was trying to make some sense of a Breton website the other day by viewing it with Wordlink and wasn’t getting on very well – partly because I know hardly any Breton(!), partly because the online dictionaries are not very good, but partly because of a difficulty Wordlink had with Breton.
In Breton “c’h” is regarded as a single “letter”, but Wordlink was breaking Breton words such as “boulc’het” into two non-words “boulc” and “het”. It was taking the non-alphabetic character “’” to be a word-break, (the same as it does in English “words” such as “queen’s” and “isn’t”). I managed to put that right and get it to treat the sequence “c’h” in Breton (or the ascii version “c’h”) as being part of a word.
Then I remembered that Catalan has exactly the same problem with “l•l” so I put that right too. I noticed that the “l•l” was also upsetting the Hunspell dictionary headword suggestions feature in Multidict, so I have written a new general “word recognition” feature for Wordlink and Multidict which could be used for other languages too. It could be used for example to treat hyphens as part of a word. Or it could be used to get Wordlink to treat “queen’s” and “isn’t” in English as single words – but I don’t think that would be a good thing on the whole.
Anyway, it is very good to have Wordlink now treating “l•l” properly in Catalan, because Catalan is one of the POOLS-3 languages and the Catalan team will soon be adding videos and transcripts to Clilstore.
Another problem with Breton, and also the other two P-celtic languages, Cornish and Welsh, is that they have a very complicated system of word-initial mutations. The Breton word “penn” can change to “benn” or “fenn” for grammatical reasons, “tad” can change “dad” or “zad”, “kalon” to “galon” or “c’halon”. This causes big problems for learners, and means that they find it very difficult to look up words in dictionaries. (Scottish and Irish Gaelic also have initial mutation, but not so complicated, and the dictionary headword is retained in the spelling, so dictionary lookup is not a problem for learners.)
This sounded like an ideal job for the new dictionary headword suggestions feature in Multidict, so I have now programmed into it “demutation” algorithms for Breton, Cornish and Welsh. They are not perfect, I know, and I have made them a “non-priority” feature in Multidict’s set of rules – i.e. the suggestions they come up with are listed at the end, to be tried only if the surface wordform lookup fails (whereas the demutation of Scottish and Irish Gaelic words is so easily recognised and distinctive that it is a priority feature in Multidict). They should be a big help to learners, though.

Advertisements
h1

Media Tips

April 26, 2014

Island Voices - Guthan nan Eilean

POOLS-3 audio and video presentation The Island Voices project originated with “Series One” in the 2005-2007 Leonardo-funded European project “POOLS”, and subsequently developed “a life of its own” after that. Technology and techniques have moved on since those early days, of course, but fundamental principles remain stable, and lessons can still be learned.

“POOLS-3” is a Transfer of Innovation project in which institutions involved in teaching Catalan, Czech, and Irish aim to replicate and develop some of the key outputs from the first POOLS project. At a recent meeting in Barcelona, Gordon Wells gave this brief presentation on approaches to media recording, based on his experiences with POOLS and Island Voices/Guthan nan Eilean.

View original post

h1

The world takes notice

April 17, 2014

The message about Clilstore is spreading around the world.

Here’s an example from the British Council Learn English Facebook page (937,000 followers).

h1

MORE UPDATES before the final project meeting

April 6, 2014

By Caoimhín Ó Donnaíle,Tools project software developer

kevinThis is very last-minutish, because I am about to leave at mid-day for Yorkshire and then on to Belfast on Monday, but I have now got a file upload facility working in Clilstore.
If you edit an existing unit, you’ll see that below the green link buttons there is now a new button “Files” which takes you to a page for uploading files and managing the files which are associated with the unit.
You can’t put files straight into a new unit as you create it. You have to save the unit first, then edit it to upload the files. Then if you want to link to the files using the green link buttons, you have to write them in as “file:Crossword.htm”, or “file:Worksheet.docs” or similar.
So the new facility is not yet as slick as it could be. As well as that, it is still missing all the error checks and security checks which ought to be built into it. I need to do more work on it. It will hopefully be usable already, though, despite its faults.

In the process of doing this work, I also gave multidict.net its own crash handler, instead of using the one which I use for other SMO work. So when things crash completely and you get a red screen, the error messages will now be in English instead of Gaelic 😉

Le deagh dhùrachd,
Caoimhín

h1

Clilstore software developments

April 4, 2014

kevin Caoimhín Ó Donnaíle,Tools project software developer never stops improving the clilstore   system, making it more user friendly each time. Here is a description of his efforts and results.

If you have a look at unit 1835 via the test server:  http://test.multidict.net/cs/1835

you’ll see that I have been making some good progress with giving Clilstore the ability to store files attached to units, rather than authors having to store them on Dropbox or elsewhere.

Unit 1835 is a test unit, which I created by cloning unit 1657 (a unit created by Jan Hardie in Switzerland, a participant in the POOLS-T project).

The first new thing which you might spot is that unit 1835 has 5 green user defined buttons. The limit used to be 3, and when Gordon and others requested an increase I raised it up to 4. Increasing it further would at the time have made the programs and database tables a lot more complicated. However, I have now completely rewritten and improved the programming behind the buttons and you can now have as many as you like by increasing their number one-by-one. If you already have 4 and edit the unit, it will let you add a 5th. If you have 5 and edit the unit, it will let you add a 6th, and so on. Of course, if you have more buttons you need to keep the text on them shorter or you will run out of screen width.

The next thing to notice is that whereas Jan’s unit 1657 had Hot Potatoes exercises stored in Dropbox, unit 1835 has them stored in Clilstore itself. And even though there are several exercises, the links between them work ok. The files attached to unit 1835 have their own URLs and therefore can be accessed independently of the Clilstore unit:

 

http://test.multidict.net/cs/1835/SMO .jpg

http://test.multidict.net/cs/1835/hp/index.htm

http://test.multidict.net/cs/1835/TestWordFile.docx

 

That means that you can use addresses like this when creating the buttons. Or else you can use shorter versions such as “file:hp/index.htm” (as used behind the scenes by the button “The same exercises”). You can also use these addresses to embed pictures within the units, as I have done with the picture of Sabhal Mòr Ostaig.

So the new facility could potentially provide great benefits and simplifications. There are two problems, though. One is that I have not yet provided any mechanism for authors to upload files such as this to Clilstore! So the facility is not yet available for use. The files in the test unit I uploaded by hand into the database. However, it should not be too difficult to provide some kind of upload facility for authors. Then we need facilities for authors to rename and delete the files which they upload, and to warn them that if they delete a unit they will delete the files associated with it too. I’ll see how much of this I can get done before I leave on Friday morning for Yorkshire and then Belfast. However, I thought it was worth letting the TOOLS team know about the work so far in case there are any comments or ideas.

The other big problem I can see is potential abuse. Up til now we have only stored the text of Clilstore units, with Javascript banned in them. Now we are about to allow authors to store practically anything, which gets us into a whole new territory. They could store files with JavaScript which could attempt to exploit weakness in the computer. They could store Windows .exe files which if executed would attempt to install a Trojan on the user’s computer. If people started using Clilstore to store “nasties”, it would quickly spoil our Internet reputation, reduce our Google rankings and maybe even get us blacklisted by browsers. I’ll certainly need to ban storing .exe files, but I don’t know how to ban all potential problems, or even whether much can be done. We can’t ban all JavaScript because Hot Potatoes depends entirely on JavaScript to work. We’ll need to put stronger checks on new authors – such as insisting that they confirm their email address before they are registered. And we’ll need to put a limit on the size of files which can be uploaded. We could discuss things like this in Belfast.

Le deagh dhùrachd, Caoimhín (Skye)

h1

CLILSTORE: NEW APPLICATION. CLILSTORE USED TO TRAIN CONFERENCE INTERPRETING

March 24, 2014

It can be said that by mere accident the Clilstore, as well as the software of the project and the units have found entirely new application! They are used at Marijampole College for the students of Applied Foreign Languages educational programme to teach conference interpreting. Marijampole College is an institution of tertiary education providing professional BA in different programmes of social and applied sciences as well as several technical fields.

tn

It started with dissemination, as it had been planned since the proposal stage of the project that one of the groups with which the Clilstore units would be piloted and then used will be students of Marijampole College Teacher training department. This was successfully completed, and the software was so interesting that the Dean offered to try it with the students of business English. The course of conference interpreting is rather short and the students after taking an exam are granted three credits. In fact this was the first course ever, as the programme is rather new and the third year students are the first ones to graduate from this course this year. It was a challenge to prepare something new and catchy for the students who were quite known among the staff for their lack of motivation and interest in studies. One of the things to make them come and pass the course is accumulative scoring, so that even those who have missed several classes would be able to take the exam, presenting individual work.

Everybody who is learning/teaching languages knows that this process requires a lot of individual work, i.e. you have to do homework, while the students nowadays (things were really different in my times!!!!:)) rely mostly not on their memory but on the Internet, which makes language learning simply impossible! Thus, if you want to catch their interest, you must do it with something original, something they never experienced before.

And-  Voilà!– here we have the Clilstore, something easy to use and really attractive! The tool is just perfect for teaching conference interpreting- I discovered after some research on the website. There are quite a few interesting courses that can be found on the website, including an international project of Vilnius University together with some other prominent HE institutions of Europe; not to mention the resources of DG Interpretation http://ec.europa.eu/dgs/scic/ of the EC, who use video for training new interpreters. However, we talk about students whose vocation is not necessarily interpreting or translation! They have a slightly more limited vocabulary in store and their fluency is far from that used by high level professionals, aiming to BECOME interpreters! Choosing a complicated video wouldn’t work, if the students are not equipped with the appropriate amount and variety of vocabulary. Hence they would simply loose interest, while Clilstore gives the students a possibility to work individually keeping the right pace and further creating their own units as individual task.

The story has just begun, but I see how the courses (of applied English) and the Tool of the Tools4Clil fell in love with each other. This had to happen- an entirely new application of the project tool is simply a developer wouldn’t even dreamed of!

h1

Big improvement to Multidict for some languages

March 3, 2014

 by Caoimhín (Skye)

error

I hit on the “hunspell” improvement to Multidict almost by accident. I

had felt, ever since the days of the POOLS-T project which first developed

Wordlink and Multidict, that the main thing which the facility lacked was

some ability to do “lemmatization” – to change a wordform which you click

on into a dictionary headword for looking up in an online dictionary.

The Greek partners in the POOLS-T project in particular complained that

Wordlink only ever succeeded in finding the occasional Greek word in the

dictionary, and the Swiss-Italian partners had the same complaint to a

lesser extent. The only reason that Wordlink works so successfully for

English texts is that English has very few inflected forms (wordforms such

as “running”, “distancing”, “distanced”) compared to most languages. And

also that English is a big enough, rich enough language that many of the

online dictionaries (with some notable exceptions such as Etymonline) have

inbuilt lemmatization. But although I hoped sometime to try and add this

capability to Multidict for many languages, I thought that it would be a

huge amount of work.

I happened to be talking to Mìchael Bauer, a linguist who has done so much

for Scottish Gaelic on the Internet. He also speaks Basque and a good

Basque dictionary had “stopped working with Wordlink”. In fact, it had

merely changed its search parameters (for the better) and I soon put

things right, but I mentioned to Mìcheal in passing that none of the

Basque dictionaries worked very well anyway with Wordlink because Basque

is a highly inflected language and the dictionaries do not do

lemmatization. Mìcheal pointed me to the excellent Basque implementation

of hunspell which contains lots of Basque inflexion rules in its “eu.aff”

file. Hunspell was first developed for Hungarian, another highly

inflected language, and instead of just relying on a huge wordlist in a

.dic file like old-fashioned spell-checkers, it can make clever use of

lots of complicated inflexion rules for the language in a .aff file.

(The “.aff” stands for “affix”.)

I started trying to decipher and understand the mathematical looking

inflexion rules in the eu.aff file, but when I read up more about hunspell

I found I didn’t have to bother! Hunspell does more than just

spellchecking. It does lemmatization too: if you give it a wordform it

will give you back a dictionary headword (or several headwords if there

are several possibilities). I soon put it into service for Basque in

Multidict, and Mìcheal Bauer declared it to be a big improvement on the

whole. I had visions of hunspell giving us lemmatization “for free” for

lots of languages.

My initial excitement soon turned to disappointment, though. I tried

hunspell for Arabic, and although the “lemma” it came up with was

sometimes very good and was found in the dictionary whereas the original

wordform was not, it more often turned out that we would have been better

just sticking to the original wordform. Most other languages were

intermediate: sometimes the original wordform was better, sometimes the

lemma from hunspell was better. What we needed was a mechanism which

would give the user the best of both worlds. That is why I came up with

the brown list of suggestions and the “click again” mechanism which gives

control to the user.

There isn’t currently much explanation in Multidict to guide the user on

the new facility, but the problem is that space is at a premium in the

Multidict navigation frame and we don’t want to clutter things up or

confuse new users. I would hope to put an explanation of the facility in

the Help file. And hopefully most users will see how it works simply by

experimentation.

How well the new mechanism works for a language depends on how clever the

hunspell implementation is for that language. For some languages, such as

Basque and also Lithuanian by the looks of things, the .aff file has lots

of inflexion rules built into it and the mechanism works very well. For

others such as German the hunspell implementation still relies on a huge

old-fashioned wordlist in the .dic file and does us little good. New

hunspell implementations are appearing all the time, though, so we can

look out for better ones as they appear and pull them in.

For Scottish Gaelic, hunspell turned out to be of hardly any use, and for

Irish Gaelic not much better. However, Mìcheal Bauer generously gave me a

huge lemmatization table he had built for Scottish Gaelic and I have added

this into the mechanism. Likewise, Kevin Scannell in the US generously

gave me huge lemmatization table he had built for Irish Gaelic and I have

thrown this in too. I threw in too a huge public domain lemmatization

table for Italian which I found back in the days of the POOLS-T project

and had stored ever since in the hope that it would be useful some day.

So for all these languages the new mechanism is I think working

particularly well. All other languages currently rely only on hunspell.

(Actually, for the afficionados, I have also thrown in a table of Old

Irish irregular verbforms.)

For Scottish and Irish Gaelic, I have moved the old rules for the removal

of initial mutations into the new mechanism. So instead of the old system

which converted “shoilleir” to “soilleir”, “thart” to “tart”, “tsaol” to

“saol”, “bhfuar” to “fuar”, “bhfuilimid” to “fuilimid” willy-nilly and

sometimes made things worse instead of better, the new system gives

control to the user. For Irish Gaelic I notice that even for some

dictionaries such as FGB which now give good lemmatization suggestions,

the new Multidict mechanism is so slick that it is often quicker and

easier to just to click again and let Multidict do the work.

Although it is wonderful what hunspell has given us for free, it is not

perfect for our task. In particular it is not good for lemmatizing common

irregular verbs or irregular noun inflexions. Hunspell’s aims and ours

are different. Hunspell’s aim in its .aff file is to supply the inflexion

rules for regular verbs and regular nouns and thereby save the space which

would otherwise be taken up in the .dic file by hundreds of thousands of

regularly inflected wordforms. It is not bothered about the small amount

of space taken up by irregular wordforms, so it just throws them all into

the .dic file and therefore cannot lemmatize them. That is why it does

not suggest “estar” when we give it the Portugues verbform “estamos”. We

could solve much of this problem by feeding Multidict a small table of

irregular verbforms for each language – but that is something for another

project.

There are lots of other possibilities. The new mechanism is super

flexible and can handle lemmatization suggestions from algorithmic rules

as well those from hunspell and from a lemmatization table. As an

experiment, I have thrown in a rule to try removing a final ‘s’ from

English words – so that if you click on “transducers” Multidict will

suggest “transducer” even though hunspell does not know this word. We

could add many such rules for many languages. The beauty of the new

mechanism is that since we are only providing the user with suggestions,

the rules do not have to always be perfect. We could add in a facility to

break words into component words so that if you you give it German

“Infobahn” it will suggest “Info” and “Bahn”. Or we could give it a

facility to convert between closely related languages, so that if you give

it Irish Gaelic “scáthach” it can automatically suggest trying “sgàthach”

in Scottish Gaelic dictionaries. Similarly for Danish and Norwegian,

Spanish and Portugues, Czech and Slovak perhaps.

The possibilities the new mechanism offers are numerous and exciting.

They need working through for each individual language. But this will have

to wait for future years and perhaps future projects, because there are lots of other things we ought to try and do

yet in the TOOLS project: hopefully giving Clilstore the ability to store

exercises files itself for one thing, so as to remove the need to store

them separately on Dropbox. But according to my mind

the easy opportunity unexpectedly presented by hunspell was too

good to miss.