Sunday, January 5, 2014

Open Greek and Latin Project



Via Gregory Crane
The Open Greek and Latin Project proposes to provide an extensive foundation for Open Access and Open Data research in the study of Greek and Latin, creating a collection, available under a CC-BY-SA license, of works produced from the Homeric epics in the mid-eighth century BCE through the 20th century CE and including content from public domain editions (including editions published as late as 1991 in 2017 under German law).
Building directly upon more than 25 years of continuous research and development by the Perseus Digital Library and upon recent breakthrough work on OCR for Classical Greek, Open Greek and Latin proposes a new collection with the following layers: (1) c. 3 billion words of Greek and Latin from public domain books with library metadata generated by OCR optimized for those languages; (2) c. 1 billion words of Greek and Latin with reasonable metadata for composition date and/or of reasonably datable text; (3) 500 million words of text with metadata identifying where a FRBR work appears in the pages within one or more printed books and including multiple editions of every major Classical Greek and Latin source; (4) 200 million words of corrected OCR source texts, evenly divided between materials produced through 600 CE and post-classical sources, including corrected transcriptions of the textual notes as well as level 3 TEI XML encoding with at least one established citation scheme for the reconstructed text; (5) automatically generated metadata for all texts and curated metadata for as much of the collection as possible (including classification and identification of named entities, textual variants and manuscript witnesses, lemmatization and morpho-syntactic analysis, identification of text reuse). Preliminary work conducted in the United States from 1987 to the present and at the University of Leipzig during 2013 and 2014 has laid the foundations for each component of this work. The current proposal requests support for an initial three-year period so as to accomplish the first half of this work.

No comments:

Post a Comment

Related posts

Related Posts Plugin for WordPress, Blogger...