Dreaming of an online reading platform for Gaelic Learners

I was very interested, and a little surprised, by the recent announcement by the Gaelic Algorithm Research Group about a Gaelic Linguistic Analyser which performs Part-of-Speech tagging, lemmatisation, and syntactic parsing. Surprised, because I knew of the work that had been done previously on automatic PoS tagging, but did not realise that things had developed considerably from then.

For quite a few years, I used Foreign Language Text Reader with my Gaelic reading, allowing me to tag and store glosses and other data for individual words, and multiword expressions. When migrating to a new computer recently, I sadly lost all my stored data on FLTR.

In the work I have been contributing to the Greek Learner Text Project, and in the many discussions I’ve had with James Tauber,, a lot of our shared interest comes back around to a set of a few questions:

  1. How do you help learners read more text, more easily?
  2. How do you select appropriate texts for readers?
  3. How do you build a platform that overcomes the difficulties of reading

Those discussions often involve a cyclical movement from pedagogy to interfaces to data. I think in an ideal world, you would have a reading interface/platform that (a) gave pop up information on all the words and phrases you needed help on, (b) had accurate tagging and data on all the words in lots of texts, (c) tracked words (and structures, syntax, etc. etc) that you were exposed to (e.g. not just a binary know/don’t know, but number of exposures, times you’d needed to click for help, time since last exposure, etc.), (d) could suggest new texts that required minimal steps of new vocab (or structures), e.g. ideally to keep you reading with a 98% recognition level.

This requires both a tools, such as those being developed for the Greek Learner Texts (which are generally language-independent), and a platform, such as being developed for Hedera, and it requires a corpus with relevant data, and/or the ability for learners to import their own texts. There already exists some a digital corpus for Gaelic texts, DASG, though it does not appear to be open access at all, nor is it clear what data is associated with it.

All of which is to say, I think we’re at the point where there is enough of a convergence of tools and resources that creating something like a learner-oriented Gaelic reading platform, and a database of texts, is more within reach than ever before. However, two particular obstacles remain: firstly the POS tagger is 91/95% accurate, depending on whether using a full or simplified tagset. This could be improved by hand-curating tagging, and feeding manually corrected tagging back to the GARG would probably be able to improve this over time. For the meantime, starting with computer tagged texts and correcting them remains necessary. I had previously made a small start on hand-tagging some texts, but it is very laborious, correcting computer-tagged texts should be a lot faster. Secondly, the copyright status of texts is an issue. For Ancient Greek, our great advantage is that texts were authored millenia ago, and many print editions are out-of-copyright. Providing contemporary Gaelic texts will require specific permissions. It would be great to see producers of publicly available material (e.g. LearnGaelic.scot) include licensing permission for reuse of texts for a project like this.

For my part, I plan to make use of the new Linguistic Analyser to start analysing some texts and producing some curated datasets of my own, to then test and integrate with tools from the Greek Learners Text Project.

If you’d be interested in collaborating on any of this from the Gaelic side, please do get in touch: thepatrologist @ gmail.com

Diary of a Digital Apprentice (2): First, a Unix tutorial

(Here for the blog-series kick-off post).

We’re playing catch-up a little, and these are things I did in the tail end of 2017.

It’s been a long time since I’ve done anything with Unix. About 10 years, actually, and my unix experience was limited to running Ubuntu at the time and being forced to troubleshoot a lot of things mainly by googling answers. That was frustrating and satisfying at the same time. A memorable highlight was the time that my system switched to Ancient Greek at some fundamental level so that I couldn’t log in because it would only input Greek characters and it was not as simple as ‘change keyboard’.

Anyway, Jedi master Tauber decided I should learn to manipulate text files in Unix and set me the following tasks. You can see them over here:

This is what I call “hunt”-learning. The teacher isn’t pushing, and the learner isn’t actively trying to pull things from the teacher, rather the teacher is setting up tasks which the learner must then go and problem-solve. I think there’s a lot to be said for such a method, and it works particularly well for something like this.

Also, by the end of 7 tasks, I had not only an appreciation for how to do these things, but a sense of both (a) the kinds of things that could be done just by manipulating appropriate data sets, (b) that so much is possible if you just have the data.

Of course, having the data, or having a text in an actionable form, is itself half the struggle.

If you’re a totally beginner like me, and want to follow through those 7 tasks, go ahead, and feel free to drop me a line if you get stuck. There’s lots I don’t know, but I know enough to hint you along the path.

Diary of a Digital Apprentice (1)

One of my goals for 2018 is to acquire a working skillset in areas of Digital Humanities. As I do so, I plan to blog regularly on that ‘mission’. In today’s post, I provide some context for the start of that journey.

 

I’d say I’ve long had a user-side interest in Digital Humanities. I’ve appreciated, and used, the considerable resources that things like Perseus, TLG,  PHI, and other packages have presented. And I’ve always envisioned ‘more’ being possible. But, being relatively short on the technical side of things, DH has always been a bit of black-box wizardry to me.

A couple of years back I made the acquaintance, first digitally, of James Tauber. Some of our initial overlap and discussion had to do with tools for language learning and teaching. We met briefly at AARSBL in 2015, and conversed a bit more since then. Another face to face meeting at AARSBL in 2017 helped solidify things and we have launched both some collaboration, but also some apprenticing.

That ‘looks like’ two things. Firstly, a combination of push-learning, pull-learning, and hunt-learning. Pull, where I ask, “how do we do X?” or “is Y possible?” and then get a crash course on how to make certain things happen. Or an explanation of “yes, Y is possible, look, Dr ABC has been working on this for umpteen years, see!”. Push-learning is where you learn things you didn’t know you could learn, e.g. “Hey, Seumas, did you know  you can use E to accomplish F, G, and H!” And hunt-learning is when James says something like, “Seumas, figure out how to do M, N, O, and P, and then tell me how you did it or when you get stuck.”

Part of this relates to the work that Eldarion is doing on developing the Scaife Viewer for Perseus. Which is incredibly exciting because (a) Perseus! (b) have you seen the Scaife Viewer demo’d? (c) it’s great to see inside the black-box so to speak, to see how something like this gets developed and figure out how it works.

Another side of it is my digital Nyssa project for the year.

“Digital Nyssa” is my project to curate/shepherd a text ((initially just one, but maybe more)) through an open, free, digital pipeline from print to digital edition. It’s both a means of acquiring practical DH skills across a range of tools (OCR, TEI-XML marking, PoS and morph tagging, digital edition creation and then commentary/annotation/translation). You’ll be hearing more about it as the year goes on, and I’ll outline a little bit more next week.

So, each week I’ll be posting up a bit of what I’ve been doing/learning/working on, as part of a bigger project to self-document the learning process for myself, and hopefully encourage others that DH is not so scary. The first few weeks will play some catch-up too on things over the past few weeks.