Our long-term plan is to collaborate in producing digital documentation of a large number of minor languages in India. We do not wish to use the more commonly used term ‘minority languages’ because in the Indian context, the term assumes a different meaning – hinting at religious as well as linguistic minorities. The term ‘Lesser Known languages’ will be used here instead.
In the initial stages, we intend to concentrate our attention in creating digital documentation of minority languages of Northern and North-Eastern parts of India. Some main motivations for the choice of this geographical area are:
(i) This region represents some of the least documented regions; linguistic documentation of many languages spoken in this region is completely lacking;
(ii) There is special emphasis of the Government of India in developing these regions through special drives in different spheres of activity – from economic to educational, and as there have not been any linguistic documentation of the languages spoken in this region, the kind of descriptions undertaken here will benefit literacy and education planning for these mother tongue speakers, and it will also help in promoting these languages in writing.
(iii) These languages offer rich phonetic inventories and interesting sound structure as well as other typological features.
Documentation of each language will include the following:
A. A brief description of its genetic affiliation and major typological characteristics;
B. An outline of its sociolinguistic situation (e.g., size of the speech community, to what extent is the community monolingual, literacy situation, etc);
C. Photographs and video recordings to place the linguistic material in its perspective;
D. A brief description of previous works on this language and culture;
E. A description of the grammatical terminology, abbreviations used and what they stand for, and on the transcription convention used;
F. Direct-elicited data on lexical semantics, kinship terms, numerals, paradigms (inflectional and derivational morphology);
G. An annotated narrative corpus (together with audio and visual recording – to the extent possible) with the following information:
1. a unique reference field (including information on the informant and the context in which the data was collected)
2. phonetically transcribed units, generally in terms of a clause or a sentence;
3. morphological representation of the clause or sentence-like unit;
4. a morpheme-by-morpheme gloss or rough translation into English;
5. a free translation of the clause into English.