ANUVADAKSH – An Expert English to Indian Language Machine Translation (EILMT) (Consortium under DIT)
The ANUVADAKSH 'An Expert English to Indian Languages Machine Translation System [EILMT]' is a state-of-the-art solution that allows translating the text from English to eight Indian languages namely Hindi, Urdu, Bengali, Marathi, Tamil, Oriya, Gujarati and Bodo. The domains considered are Tourism and Healthcare. The project is funded by DeitY, Government of India under TDIL program - "Technology Development for Indian Languages".
This is a Multi-Lingual, Multi-platform & Multi engine hybrid System, being developed by Consortia of 13 Institutes in India - C-DAC, Pune, C-DAC - Mumbai, IIIT - Hyderabad, IISc - Bangalore, IIT - Mumbai, Jadhavpur University - Kolkata, Amrita University - Coimbatore, IIIT - Allahabad, Banasthali Vidyapeeth - Banasthali and Utkal University – Bhubaneshwar, Dharmasing Desai University - Nadiad, Notrth Maharastra University - Jalgaon and North Eastern Hill University Shillong where Applied AI Group, C-DAC, Pune is working as a Consortia Leader. This system is developed to facilitate the multi-lingual community, initially in the domain-specific expressions of Tourism & Health and subsequently it would foray into various other domains as well in a phased manner.
This is a collaborative effort of the consortium institutes which have brought forward the integration of three Machine Translation Technologies - TAG (Tree-Adjoining-Grammar based MT), SMT (Statistical based MT) and EBMT (Example Based MT). The associated Modules like - Named Entity Recognizer (NER), Word Sense Disambiguation (WSD), LRMT (Linguistic resource management tool) and MT Evaluation methodologies along with Language Vertical tasks are being done by respective consortia members.
EILMT System-Salient Features:
The system is facilitated by:
- Various Communication Protocols as
- Transaction via TDIL web portal
- LPMF (Localization Project Management Framework) is using ANUVADAKSH as web service.
- Pre-Processing module, prepares input text into engine suitable form with the help of
- Extracting text from uploaded files and translating for the formats .rtf, .xls, .txt and .html
- Input Format Extractor
- Morphological Analyzer
- Part of Speech Tagger
- Named Entity Recognizer including Name, Place, Date etc.
- Word Sense Disambiguator
- Noun/Phrase Chunking, Clause Identification
- The System is designed to use three translation engines working in parallel namely EBMT, SMT & TAG which would facilitate the translation for all the eight language pairs.
- The Collation & Ranking Module which is responsible for collating translated outputs of all the engines and rank them on the basis of translation accuracy.
- Post processing module provides additional features for EILMT Translation engine like
- Morph Synthesizer for smoothening the translated output
- Multiple translation options
- Synonym selection option
- Typing facility in Target Language for all the eight languages
- Transliteration Facility
- Retaining the original format of English text
- Natural Language Processing components are available
- Feedback on Translation
- W3C compliant System
- Cross Browser compatibility for IE, Mozilla Firefox, Google Chrome, Apple Safari and Opera
Contact Us
Centre for Development of Advanced Computing
Applied AI Group, 4th Floor,
C-DAC Innovation Park,
Panchavati, Pashan,
Pune - 411 008.
Phone No.: (020) 25503314 / 15
Email: info.aai@cdac.in