Sunteți pe pagina 1din 9

What is Anusaaraka?

Anusaaraka is an English – Hindi language accessing software. With insights from


Panini's Ashtadhyayi (Grammar rules), Anusaaraka is a machine translation tool being
developed by the Chinmaya International Foundation (CIF), International Institute of
Information Technology, Hyderabad (IIIT-H) and University of Hyderabad (Department
of Sanskrit Studies). Fusion of traditional Indian shastras and advanced modern
technologies is what Anusaaraka is all about.

Anusaaraka shall allow users to access text in any Indian language, after translation
from the source language (i.e. English or any other regional Indian language). In today's
Information Age large volumes of information is available in English – whether it be
information for competitive exams or even general reading. However, a lot of the
educated masses whose primary language is Hindi or a regional Indian language are
unable to access information in English. Anusaaraka aims to bridge this language barrier
by allowing a user to enter an English text into Anusaaraka and get the translation of the
same in an Indian language. The Anusaaraka being referred to here has English as the
source language and Hindi as the target language.

Anusaaraka derives its name from the Sanskrit word 'Anusaran' which means 'to follow'.
It is so called, as the translated Anusaaraka output appears in layers – i.e. a sequence of
steps that follow each other till the final translation is displayed to the user. Given below
is an example to make this clearer:

A I am enjoying Gods Grace.


B I be enjoy god Grace.

C pronoun verb verb noun PropN

D NP VP VP NP VP

E मै --> आनंद~लेना~{@ing}{0_रहा_है} ईशर Grace..

F मै -- आननद_उठा रहा हूँ ईशर Grace #का.

In the table above: the first row A displays the sentence to be translated. The second
row gives the 'root' word for each of the words in the English sentence. Row 3 i.e. C
displays the 'parts of speech' of each of the words in Row B: i.e. a verb , pronoun, noun,
Proper noun etc. Row D specifies if the words belong to a Noun phrase or a verb phrase
within the sentence. Then there is a Hindi translation for each word and in the final row
(row F) the tense, gender, number, person of words are taken into account.

The table above shows Anusaaraka output. Thereafter, some more steps are done at the
backend and the final output is displayed after prepositions are moved to the correct
position. This step is called Machine Translation. Thus, Anusaaraka translation cum
Machine Translation output would be: मै ईशर के Grace का आननद उठा रहा हँू l

(Please note: the software is still under development – and thus, the word Grace has
still not got translated in the above example).

Salient Features of Anusaaraka


Faithful representation of text in source language:

Throughout the various layers of Anusaaraka output there is an effort to ensure that
the user should be able to understand the information contained in the English sentence.
This is given greater importance than giving perfect sentences in Hindi, for it would be
pointless to have a translation that reads well but does not truly capture the information
of the source text.

The layered output is unique to Anusaaraka. Thus, source language text information and
how the Hindi translation is finally arrived at, can be accessed by the user. The
important feature of the layered output is that the information transfer is done in a
controlled manner at every step thus, making it possible to revert back without any loss
of information. Also, any loss of information that cannot be avoided in a translation
process is then done in a gradual way. Therefore, even if the translated sentence is not
as 'perfect' as human translation, with some effort and orientation on reading
Anusaaraka output, an individual can understand what the source text is implying by
looking at the layers and and context in which that sentence appears.

Reversibility:

The feature of gradual transference of information from one layer to the next, gives
Anusaaraka an additional advantage of bringing reversibility in the translation process –
a feature which cannot be achieved by a conventional machine translation system. A bi-
lingual user of Anusaaraka can, at any point, access the source language text in English,
because of the transparency in the output. Some amount of orientation on how to read
the Anusaaraka output would be required for this.

Transparency:

Display of step-by-step translation layers gives an increased level of confidence to the


end-user, as he can trace back to the source and get clarity regarding translated text by
analysis of the output layers and some reference to context.

Philosophy

नेहािभकमनाशोऽिसत पतयवायो न िवदते ।


सवलपमपयसय धमरसय तायते महतो भयात् ॥
॥ शीमदगवदगीता - २-४०॥

In this there is no loss of effort, nor is there any harm


(production of contrary results). Even a little of this knowledge,
even a little practice of this Yoga, protects one from the great
fear.

ll Srimad Bhagawad Geeta - 2.40 ll

We believe:

Anusaaraka should ideally be done in the spirit of 'Karma Yoga', which implies:

- One may not have all the competence necessary for doing a task.
- Competence can be developed if one is willing.
- There is never any loss of effort and there is never failure.
- No achievement is impossible if people act in a cooperative spirit for the good of all.
- As one is developing competence – whatever contribution one can make is welcome
and appreciated.

Why Machine Translation?

Today technology has made it possible for individuals worldwide to access large volumes
of information at the click of a button. However, very often the information sought may
not be in a language that the individual is familiar with.

Thus, Machine Translation is an endeavor to minimize the language barrier, by


making it possible to access a text in the language of one's choice.

For technology to be able to provide the above facility, many aspects of language are
involved.

To name a few:
•Script
•Spelling
•Vocabulary
•Morphology
•Syntax

Keeping the above in mind, Machine translation systems need to be equipped to


translate a text within seconds and yet capture the information of the text to the best
possible extent.
Brief History of Machine Translation:

Machine Translation (MT) is not a new development. Given below are examples of some
MT systems, along with their strengths and weaknesses:

TAUM METEO
•High quality
•Fully automatic
•Very limited domain (translates weather bulletins from English to French)

SYSTRAN
•Practically being used by many IT firms such as Altavista, etc.
•Domain independent.
•Translation is unreliable (leading to misinterpretations)

MT efforts in India, at a glance

DOMAIN SPECIFIC
•Mantra system designed by C-DAC and used for translation of Government
appointment letters. This uses the 'Tree Adjoining Grammar' approach.
•A system designed by ERDCI & IIT – Kanpur, for translating Public Health campaign
documents. It uses the Angla Bharati approach.

APPLICATION SPECIFIC
•Matra - a human-aided MT tool, designed by NCST, (now known as C-DAC)

GENERAL PURPOSE MT
•Angla Bharati approach (IIT-Kanpur)
•UNL based MT (IIT-Mumbai)
•Shiva: EBMT (IIIT-Hyderabad/IISc-Bangalore)
•Shakti: English-Hindi MT System (IIIT-Hyderabad)

Difficulties in MT

Questions are often raised regarding why there has been only limited success
in Machine Translation efforts thus far?

Machine Translation efforts are aiming at reducing the language barrier to ZERO! This is
almost like trying to build a machine that can walk like a human! This is rather ambitious
if not impossible.

The following are examples of some key problems faced in machine translation:
Ambiguity
E.g.:1: ररर रर रररर रर l (rāma phala khātā hai.)

Challenge: Machine has difficulty in identifying ररररर and रररर.


Solution: Human beings use world knowledge to resolve this.

E.g.:2: Time flies like an arrow.

Possible parses: (Note: Parsing is a term used in Linguistics and implies the identification
of the grammatical role of each word in the sentence. In the following sentences: N
refers to Noun; V to Verb; Prep is used for preposition; Det for determiner.)

a) Time flies like an arrow


N V Prep Det N

b) Time flies like an arrow


N N Prep Det N

c) Time flies like an arrow


V N Prep Det N
(Interpreted as: flies are like an arrow)

d) Time flies like an arrow


N V Prep Det N
(Emphasis on: manner of timing)

Challenge: Different interpretations are possible.


Solution: World knowledge and context.

E.g.:3: He saw a man on the hill with a telescope.

Possible readings:
saw with a telescope
man with a telescope
hill with a telescope

Challenge: Different interpretations are possible.


Solution: World knowledge and context.

E.g.:4: Patient had a stiff neck and fever

Patient had stiff neck and fever


Patient had stiff neck and fever

Challenge: Different interpretations are possible.


Solution: General awareness.

E.g.:5: 'Table the resolution'

U.S.A.: To postpone
U.K.: To present before the committee

Challenge: Different interpretations in different countries.


Solution: Knowledge of British vs. American usage.

Dereferencing

E.g.: 1:
The farmer's wife sold the cow because she needed money.
The farmer's wife sold the cow because she was not giving milk.

We know that 'she' in the above sentences can refer to wither the 'wife' or the 'cow'
giving different interpretations.

E.g.: 2:
The mother with babies under four ...
The mother with babies under forty ...

Clue: Common Sense.

E.g.: 3:
William James cited Mozart's discussion on his composition.
Marvin Minsky cited McCorduck's discussion on his research.

Clue: Knowledge of historical background is important.

Incomplete/ungrammatical sentences

Doctors' diagnosis notes:


• stiff neck and fever.
• brain scan negative.

As one would note from the examples above – interpretation depends on common sense
and knowledge of the world which the machine would not have. Thus, the task of
Machine translation becomes challenging. Interestingly, even for human interpretation, if
two languages are similar, then the challenge for the user to interpret is not very great.
However, if the incompatibility increases (as in the case of English and Hindi), the task of
interpretation (even for a human being) becomes more complex. Here we are attempting
to get the machine to do the same task – thus, the magnitude of the challenge increases
manifold.

Participation

Language Resource development has been considered to be the domain for people
specifically trained in Linguistics. One of the fascinating aspects of Anusaaraka is that it
has areas in which anyone like you and I can participate in this endeavor.

The eligibility criteria are:

a) Willingness to spend sometime on Anusaaraka.


b) Knowledge of English and Hindi.
c) Willingness to contribute in the spirit of dedication, for the good of all : i.e. the यज
spirit.

To many it may seem absurd that the invitation to participate in the development of
Anusaaraka is being extended to all. Pujya Gurudev Swami Chinmayananada ji says:
“No achievement is impossible for man, if he knows how to act in the discipline of co-
operation."

We have already got a first hand experience of the above by having so called non-
technical people playing a crucial role in building Anusaaraka thus far.

We shall assist you in developing the necessary competence to participate in Anusaaraka


Language Resource development.

Participation Options:

a. Support with your time:

Volunteer to contribute to the Language Resource Development task if you know English
and Hindi and are willing to dedicate some time to this work every week.

Register yourself as a Participant at this site and join the global group of Anusaaraka
voluntary participants who contribute and enrich the translation quality. Participate,
share and explore the fascinating world of language.

We assure you that no formal training in Linguistics or Translation is required for this
task. The Guidelines prepared for the task are sufficient to get anyone started. Majority
of the Participants as on date, have learnt 'on-the-job' besides having attended some
workshops.
(Please note: Individuals who are interested but shall not be able to contribute without
monetary implications should write to: anusaaraka@chinfo.org to discuss their
association with Anusaaraka.)

b. Support with publicity:

Tell your relatives, friends and acquaintances who have some time to spare and may be
interested, to visit this site and register as Participants.

Share this information with Research scholars who are working in the areas of
Linguistics, Natural Language Processing (NLP), Computational Linguistics, Sanskrit
Grammar (Panini's Ashtadhyayi).

c. Support with resources:

Some of the resources that you can assist with are listed below:
I)Soft copy of books and material (for which copyright permission need not be sought)
both in English and Hindi as resources for the Corpora.
II)Infrastructural resources – computers, server space, venue for organizing two-day
workshops on Anusaaraka.
Please send an email to: anusaaraka@chinfo.org giving details of how you wish to
support the project.

d. Support with funds:

•Interested individuals or organisations, who are unable to support with their time could
explore the option of monetary assistance. Funds received shall go towards:
•Rewarding Participants who are not volunteers;
•Giving scholarships to students who take up Anusaaraka as their research topic;
•Creating a Corpus Fund for Research - aiming at the application of Indian traditional
shastra knowledge in the contemporary world.

Please click here to download the Donation Letter which you are requested to fill in and
send along with your financial contribution.

Your contribution (Money-order/Draft/Cheque) may be mailed to the Chinmaya


International Foundation, at our address -.

The Manager,
Chinmaya International Foundation,
Adi Sankara Nilayam, Veliyanad – 682 319.
Ernakulam District, Kerala, India.

All Cheques, Drafts and Money-orders are to be made in favour of 'CHINMAYA


INTERNATIONAL FOUNDATION' payable at the State Bank Of Travancore, Piravom.
Your donations are exempt from tax u/s 80G of the Income Tax Act. For making online donations,
please write to: office@chinfo.org

S-ar putea să vă placă și