[artinfo] The Rosetta Project

Jonathan Prince jonathan@killyourtv.com
Sat, 24 Feb 2001 23:56:09 +0100


Fifty to ninety percent of the world's languages are predicted to
disappear in the next century, many with little or no significant
documentation. Much of the work that has been done, especially on smaller
languages, remains hidden away in personal research files or poorly
preserved in under-funded archives.

As part of the effort to secure this critical legacy of linguistic
diversity, the Long Now Foundation is working to develop a contemporary
version of the historic Rosetta Stone. In this updated iteration, our goal
is a meaningful survey and near permanent archive of 1,000 languages. We
have three overlapping motivations in the project:

To create an uniquely valuable platform for comparative linguistic
research and education.

To develop and widely distribute a functional linguistic tool that will
help with decipherment and recovery of lost languages in distant futures.

To offer an aesthetic object that suggests the great diversity of human
languages as well as the very real threats to the continued survival of
this diversity.

Our 1,000 language corpus expands on the parallel text structure of the
original Rosetta through archiving seven distinct components for each of
the 1,000 languages. We have selected these components as the "minimum
representation" most likely to be useful for future, linguistic
archaeology as well as contemporary comparative research.  This sketch
should be understood as a modest frame that is possible to complete for a
very large number of languages - a frame on which more will hopefully be
hung later.

The seven components are:

Meta-data/description for each language: Origin and current distribution
of language, number of speakers, family, typology, history, etc.

Main parallel text: We are using translations of Genesis Chapters 1-3 as
Biblical texts are the most widely and carefully translated writings on
the planet.

Vernacular origin story with interlinear gloss: A cultural specific
counterpoint to the Genesis text with grammatical analysis. We will
substitute other vernacular texts if a glossed origin story is unavailable
or culturally inappropriate.

Swadesh 100 word vocabulary list: A core word list typically collected in
linguistic field work.

Orthography: The writing system(s) of the language with pronunciation

Inventory of Phonemes: The basic sound units of the language.

Audio file: Sample of spoken language with transcription and ideally a

We have finished the collection of Genesis translations for 1,000
languages as well as parsed the Ethnologue for corresponding language
descriptions. We now need text contributions for all the remaining
components and invite you to submit in your area of expertise. We also
encourage suggestions for languages that currently are not on the list,
but should be, given interesting structural features, genetic
relationships, isolate status, etc. --

Jonathan Prince
http://KillYourTV.com  - it's bad for you
http://GWBushSucks.com - he's bad for everyone
http://USoutofColombia.org - stupid wars are bad
"More than any time in history, mankind faces a crossroads.
One path leads to despair and utter hopelessness.
the other, to total extinction. Let us pray we have
the wisdom to choose correctly."   - Woody Allen