Project: Shepherding a text from print to digital

One of my projects for 2018 is to take a text and shepherd it, or curate it, all the way through an open source pipeline from ‘print’ to ‘digital edition’. This is part of my 2018 year of digital humanities. Here I talk a little bit about the envisioned process.

The text I have in mind is quite short, just over 2000 words. It’s Gregory of Nyssa’s De Deitate adversus Evagrium (in vulgo In suam Ordinationem). I’ve done some work on De Deitate Filii et Spiritus Sancti and this will be a nice complement to that.

My checklist of things to do:

The Pipeline

Step 1: OCRing a print text
Step 2: Correcting the OCR output
Step 3: Create a TEI-XML version.
Step 4: PoS Tagging/Lemma tagging/Morph tagging
Step 5: Produce a translation
Step 6: Alignment
Step 7: Annotations and commentary

Then, voilá, open-sourced text freely available with useful data attached. Half of these things I don’t actually know how to do yet. Maybe more than half. That’s part of the fun. And, presuming it goes well, will make it a pilot project for future texts through a similar pipeline.

5 responses

  1. I have had a strong desire for quite some time now to look into doing some of this type of work as well, but dwells low enough on the priority list that I’ve not gotten around to it yet. I am pondering a research project as part of my MDiv that might neatly intersect with a project like this. (Actually my real desire is to move past just the digitisation towards a open/free online database that allows conducting searches of this type of material.)


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: