In this paper we investigate a particular translation problem in Bangla and see how a Principle-Based approach can handle it. A restricted version of a bidirectionally operating MT system between Bangla and Hindi, we propose, carries a parameter setting regarding the use of Classifiers in Bangla and their absence in Hindi nominal expressions. An exercise involving such a restricted operation might seem regressive in a period shaped by NLP goals based on discourse models. Some scholars might argue that the only fruitful NLP task is the analysis of sentences as they configure in real speech situations. However, our reading of the current state of affairs suggests that it may not be such a waste to break down our goal of building the ultimate NLP system into smaller subgoals. We believe that such a manoeuvre will yield far more encouraging short term results.
Concerning the directionality of translation. The general architecture of the system constitutes a language independent interlingual (IL) representation to be acted upon simultaneously by two sub-components before proceeding towards a TL representation. One component, which we call the Generate Tree Procedure (GENTREE), will provide us with bare syntactic structures with the help of the X'-theory and some other parameters. The other component, which we call the Principle and Parameters Component (PARACON2), will host all the principles, the rest of the parameters, and the constraints.
These two subcomponents forming a larger component which deals with the syntactic procedures of the system act interactively to produce substitution ready IL representations. We have recommended a bottom up approach, since a top down parser (and a parser with a dominant grammar component, more often than not, tends to become one) is not robust enough to deal with deviant expressions.
This MT system would involve two steps. During I GENTREE applies and projects each lexical item to its maximal projection (given certain constraints of the complete phrase stage), attaches phrases (relative to the Head), and predicts empty elements (like traces in the prenominal position for Hindi and postnominal for Bangla). This procedure then generates trees which are underspecified as to the value of various features. PARACON2 then checks on each subtree locally for well formedness and either returns modified structures or rules out certain structures based on principles and constraints.
For step II we then have substitution ready language independent IL representations as inputs which, referring to the TL lexicon, gets substituted appropriately to derive the TL forms. Note that the IL forms can be translated into any TL form which means that you would need exactly one parser and one generator for translating for any language couple. Its extensibility to other languages is a major advantage of an Interlingual approach. Home
Tanmoy Bhattacharya
Universität Leipzig
Zentrum für Kognitionswissenschaften
GK "Universalität und Diversität
Augustusplatz 10/11, 01409 Leipzig