header

Research Areas - Standardisation

 
C-DAC Logo
 

GIST contributions towards Standardization in Indian Language Computing - An Overview

Need for standards - Basic Hardware systems and / or Software applications are designed and developed even today with only English in mind. To proliferate the acceptance and usage of Indian languages, the Indian language implementation / flavour needs to sit on top of existing applications and hardware frameworks. GIST has a focus on all 22 official Indian languages. Of these - Assamese, Bengali, Bodo, Dogri, Hindi, Gujarati, Kannada, Konkani, Marathi, Malayalam, Maithili, Nepali, Oriya, Punjabi, Santhali, Tamil, Telugu use a left to right writing style while Urdu, Sindhi and Kashmiri are mostly used in right to left mode. There are several overlaps wherein one language may use multiple scripts (eg: Konkani may be written in Devanagari, Kannada or Roman) as well as having one script like Devanagari cater to multiple languages. In order for any application to reach the masses of India it is important to support Information Technology in various languages of India.

On the Web and Mobile platforms GIST has researched various aspects of the W3C recommendations and submitted the findings related to various languages including the right to left scripts. This activity is especially important in order to bridge the digital divide and proliferate the use of Indian languages on various modern media including television, handheld PDAs, Information access points, etc.

C-DAC GIST has participated in various standardization activities pertaining to language technology. It is also involved in standardization of heritage scripts of India.

Standardization

W3C (World Wide Web Consortium)

Introduction
Under the aegis of DIT, C-DAC GIST has come up with a draft report on the representation of the seven languages catering to the various recommendations of the W3C. Of these, four belong to the Brahmi family and use Left To Right (LTR) mode to display the characters: Gujarati, Marathi, Konkani and Dogri. While Sindhi, Kashmiri and Urdu which are Perso-Arabic use the Right To Left (RTL) mode of visual display. C-DAC GIST has extensively researched the various aspects related to Localization (l10n) and Internalization (i18n).

The broad areas of research and recommendations include:

Dynamic CSS Tester
Dynamic CSS Tester is a comparison tool for comparing effects of various CSS as they are applied on UTF-8 data. It allows you to easily preview and compare different CSS side by side with various CSS applied on them. This case-study aims to investigate issues related to rendering or display of Indian language content in UTF8 and the effects of various CSS styles on it. To use it, all you have to do is to simply enter the text you would like to preview, then modify the various styles until you find a style set you want. If see any problem with the applied CSS, take a screenshot of a problem and send it along with the mail that you can send us with the help of the Feedback link given on the same page. In case if you feel that your data is not correctly rendered in the mail, just click GetSample button on the page, copy the code generated by it in the text box next to the button, and paste that in the Mail. The feedback that you send will be verified by GIST and consolidated and forwarded for further action to the W3C.

Internationalized Domain Names
In this age of Information Technology (IT) with the entire Globe being integrated into a web-linked village with the knowledge as the sole differentiator, development of convivial Access Technology has gained prime importance. Especially for India, with its diverse and multi-lingual heritage and culture, the Internet is expected to play dominant integrating role for integrating almost all aspects of social and economic endeavor.

More »

Introduction
GIST undertook research and study of various RFC and their applicability vis-à-vis Indian Languages under the guidance of the DIT.

The research is focused on Domain Names in Indian languages for Hindi, Gujarati, Urdu, etc. and included the following:

E-GOVERNANCE

GIST has contributed to recommendations related to the entire lifecycle of developing Indian language compliant e-governance applications.

These recommendations arise from C-DAC GIST's expertise in Indian languages and use of GIST tools and technologies in various large-scale, Indian language data-centric e-governance projects.

C-DAC GIST Tools have been used in several turnkey G2C (Government to Citizen) applications both at state and central level. GIST has also assisted several agencies in implementing various medium and large-scale projects.

It also participates in various forums for standardizations of the languages of India.

GIST is working towards standardization of Storage, Inputting and Display standards for Bodo, Santhali, Dogri, Maithili, etc. which have been added recently to the list of official languages.

Linguistic Formats and Heritage scripts

Storage

Input

Display Fonts

OPEN TYPE FONTS - For UNICODE support in various applications, GIST Labs has developed Open Type Fonts for various scripts including Urdu (Naskh as well as Nastaleeq/Nastaliq), Sindhi and Kashmiri. Various modern OS today support OT Fonts for viewing UNICODE data. Several GIST Tools have also been upgraded to support the OT-Font technology.

ISFOC - Intelligence based Script FOnt Code: The primary rule of thumb for typography is - If the text does not look good we do not feel like reading it. Good typography is characterized by well-structured letterforms in a particular font, pleasant inter-letter spacing, ideal word spacing and healthy line spacing. Emphasis has been placed on text compositions (horizontal as well as vertical) and final reproduction on output devices such as screen and printers, aesthetic rendering and display for True Type Fonts.

Naming conventions for GIST TRUE TYPE (TT fonts)

1. A. Mnemonics :
Assamese (AS), Bengali (BN), Devanagari (DV - catering to Hindi, Marathi, etc.), Gujarati (GJ), Kannada (KN), Malayalam (ML), Manipuri (MN), Oriya (OR), Punjabi (PN), Tamil (TM), Telugu (TL)

or

1. B. Corresponding Bilingual
: ASB, BNB, DVB, GJB, KNB, MLB, MNB, ORB, PNB, TMB, TLB

or

1. C. Corresponding Bilingual Web
: ASBW, BNBW, DVBW, GJBW, KNBW, MLBW, MNBW, ORBW, PNBW, TMBW, TLBW

or

1. D Corresponding Monolingual Web
: ASW, BNW, DVW, GJW, KNW, MLW, MNW, ORW, PNW, TMW, TLW

2. Followed by hyphen
3. TT - indicating True Type Font
4. Name of font Surekh, Yogesh, Mukta, Amar….
5. Numerals EN English numerals (optional) Tamil, Telugu and Malayalam support only English numerals

Example : "GJBW-TTAvantikaEN" is

For more details, please contact:

More information on GIST products
E-Mail:
info[dot]gist[at]cdac[dot]in

Sales related information
E-Mail:
sales[dot]gist[at]cdac[dot]in

Support related information
E-Mail:
support[dot]gist[at]cdac[dot]in