Skip to main content |  Skip to navigation
Home | C-DAC Centers | Sitemap
Search
English | Hindi | Choose_Language
  • assamese
  • bangala
  • bodo
  • dogri
  • gujrati
  • kannada
  • konkani
  • konkani_n
  • kashmiri
  • kashmiri_keshur
  • maithili
  • malyalam
  • manipuri
  • manipuri_n
  • marathi
  • nepali
  • oriya
  • punjabi
  • santali
  • santali_n
  • sanskrit
  • sindhi
  • sindhi_n
  • tamil
  • telugu
  • urdu
About C-DAC  |  Products & Services  |  Research & Development  |  Press Kit  |  Downloads  |  Careers  |   Tenders    |  Contact Us
High Performance Computing,
Grid & Cloud Computing
Multilingual Computing & Heritage Computing
Professional Electronics,
VLSI & Embedded Systems
Software Technologies including FOSS
Cyber Security & Cyber Forensics
Health Informatics
Education & Training
Multilingual Computing &
Heritage Computing
 

Multilingual and Heritage Computing

Mission:
Dissolving Language Barriers to place the power of computing and e-contents in the hands of the people of India

 

India is a country with 22 official languages and use of computers is fast spreading not only to create employment in the IT sector but also to support productive use of IT in daily life - increase productivity and competitiveness, provide better quality of life, enable inclusiveness and strengthen democracy. Ability of different sections of people to use computers (and increasingly text and data over mobile phones) demand that the Basic Information Processing Kit for Indian languages is constantly upgraded for various hardware and software platforms, new tools added and work promoted with developers, ISVs and System Integrators and application developers to enable/support Indian languages use in different sectors/verticals. And increasingly, Indian language content in Digital form has to be created and supported for applications to be supported and reach a critical mass.

Also, a whole range of new emerging technology tools and capabilities from Machine assisted Translation and OCR/OHR to Cross Lingual Information Retrieval (CLIR), Web 2.0, Indian language Browser, Speech interfaces (Text to speech, speech to text and speech to speech) and Search Engines - mature in English but in fast emerging major languages - are supported in Indian languages as well. In addition, development of support to .IN domain with Indian language domain names is another target area of work.

In Multilingual Computing and Allied Areas, C-DAC continues to work towards the design development and deployment of technologies /solutions for the following areas:

  • Speech Processing
    • Speech Recognition
    • Speech Synthesis

  • Natural Language Processing (NLP)
    • Machine Translation
    • Information Extraction & Retrieval (IR)
    • Semantic Search

  • Optical Character Recognition (OCR)
    • Indian Languages OCR
    • Indian Language On-Line Handwriting Recognition (OHR)

  • Localisation
    • Fonts (TTF & OTF) for Indian Languages
    • Data Processing Tools
    • Standardization in Localization benefiting e-governance
    • Localisation of Middleware
    • IDN & E-mail Id in local languages
    • Transliteration amongst Indian Languages

 

Speech Processing

  • Speech Recognition
    • Speech corpus creation, analysis and management tools
    • Phoneme and grapheme mapping tools
    • Text conversion tools
  • Speech Synthesis
    • Speech corpus creation, analysis and management tools
    • Speech corpus creation, analysis and management tools
    • Phoneme and grapheme mapping tools
    • Text parsing tools
    • Speech synthesis tools
    • Learning and training modules
    • Speech parameter control module
    • Intonation and prosodic rule generation

Natural Language Processing

  • Machine Translation
    • Corpus creation, analysis and management tools
    • Pre-processing and post-processing tools
    • Parsing and generation tools
  • Information Extraction and Retrieval (IR) of English/Hindi IE/ IR System for the domains of Banking, Agriculture and Railways, Mobile Services; Cross-lingual IE/ IR system using domain specific developed translation Systems; Knowledge based as well as generic Search Engines; Summarizer for English and Hindi, etc.

  • Semantic Search
    Semantic Search attempts to augment and improve traditional search results (based on Information Retrieval technology) by using data from the Semantic Web, and adding Indian languages to Semantic Search

Optical Character Recognition (OCR)

  • Indian Languages OCR
    • Language independent components, such as image cleaning, skew adjustment, image detection, column detection, table detection, etc.
    • Font training module.
    • Document analysis module backed by dictionaries, spell checker and auto language detection tools.
    • Aligning analyzer, recognition and generator modules.
  • Indian Language On-Line Handwriting Recognition (OHR)
    • Language independent components, such as image cleaning, skew adjustment, image detection, column detection, table detection, etc.
    • Document analysis module backed by dictionaries, spell checker and auto Language detection tools.
    • Aligning analyzer, recognition and generator modules.

Localisation

  • Fonts (True Type Font & Open Type Font) for Indian Languages
    Several TTFs and OTFs has been developed for various Indian Languages including Tamil, Hindi, Telugu, Punjabi, Assamese, Bangala, Gujarati, Kannada, Malayalam, Marathi, Oriya, and Sanskrit.
  • Multilingual data processing tools
    • Word Processors, DTP Solutions and SDKs for various Indian languages
    • Fonts Engineering and design for all Indian languages including tools for fonts design
    • Emotive Fonts
    • Email-Web based and Desktop Client
    • Content Creation Tool- Desktop based
    • Standards development for Vedic in ISCII for Samaveda, UNICODE for all Vedas
    • Vedic Editor with transliteration to Grantha and Roman with Vedic accents
    • Vedic transliteration to Kannada, Telugu, Malayalam, Sanskrit
    • Expert Writing Tools- feature additions such as Spell Checker, Grammar Checker, Dictionaries, Thesaurus, Typing assistance tools, Predictive Writing, Digital Pen inputting, etc.
    • Smart Search Engines.

  • Standardization in Localization benefiting e-governance
    Development of Localization tools to enable use of Indian languages for e-governance applications, having
    • Tools for Indian languages GUI design
    • An integrated tool for data conversion like font conversion, database conversion and doc/ txt/excel / ppt files conversion
    • Integration of Transliteration Utility for Names and addresses in all Indian languages
    • Interface with Indian languages OCR
    • Lexical databases development like dictionary, terminology banks for different domains such as IT, Agriculture, Railways in all Indian languages

  • Localization of Middleware
    Enabling Internationalization standards in Middleware development to make a standard set of specifications to deliver Indian languages localized middleware applications using
    • Unicode universal character set encoding
    • A locale inspired mechanism for specifying a client's language
    • Support for multiple text ordering methodologies

  • IDN & E-mail Id in local languages
    To enable use of characters from any Indian language in Web addresses. A Web address in own language and alphabet is easier to create, memorise, transcribe, interpret, guess, and relate to. This, in turn, is better for the Web. This includes
    • Domain names registration
    • Internationalized Resource Identifier
    • email with your own name in Indian languages

  • Transliteration amongst Indian Languages
    Development of transliteration system to convert amongst Indian languages using Common Phonetic (Roman) Format at Intermediate level

 

Click here to know about Multilingual Products »