Product Information

Chitrankan OCR for Indian languages

Chitrankan's use of cutting-edge OCR technology will help data entry be super fast.

Brief Description

In our present knowledge-based society, digital information plays a key role. For information to be easily accessible, it must be converted into a digital form. Over time, we are saddled with vast volumes of legacy data and newly created records in physical format. It becomes almost impossible to retrieve vast pieces of information quickly from physical records. The OCR (Optical Character Recognition) technology is the way forward to make this happen.

Our vast treasure troves of literary heritage, comprising printed books, manuscripts, documents, etc., are becoming impossible to protect and preserve. Over the years, these documents have become so fragile that we can’t even hold them properly and read the contents. To maintain and preserve the information in old documents, such as newspaper cuttings, and critical academic and technical works, the solution lies in digitization.

Even digitally created documents (printed newspapers, magazines, corporate reports, etc.) need to be calibrated for data analytics applications to solve various problems. Modern OCR techniques make it possible to preserve information and make offices ‘paper-free’ and information easily searchable.

Chitrankan (Powered by AI) will help with fast data entry and less human effort by using Optical Character Recognition (OCR) technology, and will also provide machine translation support after validation. OCR converts document images into digital text by automatic text line detection and recognition. This OCR web application can be accessed by visiting https://chitrankan.ebhasha.in/. Click the link or scan the QR code given below to go directly to the site.

Use Cases

Office automation
Archival of text matters
Business card reader
Data entry
E-book generation
Searchable menu
Signboards
Number plates

Domain:

Banking/Legal/Healthcare/Education/Finance/Government agencies

Salient Features

Input image formats:
PDF, BMP, JPG, JPEG, PNG, TIFF
Output formats:
Allows users to export recognized Unicode text into various output formats like TXT, UTF-8.
* .docx support is available in the intranet version for layout preservation.
Language supported:
Bangla, English, Gujarati, Gurumukhi, Hindi, Kannada, Malayalam, Marathi, Oriya, Tamil, Telugu, Urdu.
Heritage languages:
Marwari and Modi.
*Language support for Sanskrit, and other heritage or low-resource languages (Bhili) coming soon ...
An on-screen InScript Keyboard is provided for inputting and editing.
Handling document complexities
Colour, gray, skewed, scanned, single-column, multicolumn document images.
*Camera-captured, illuminated, dewarping, and perspective correction support might come in the future.
Machine translation:
After validation of the OCR output, machine translation can be used to translate documents
Add-on components (optional):
Phonetic keyboard with Prediction support
Spellchecker

Technical Specifications

Available as a web service.

Platform Required

This solution is deployable over Local Servers / Data Centers.
Chitrankan OCR can be made available as a cloud service on demand.

Contact Details

AI & QT Group

Email:

mgupta[at]cdac[dot]in

hemantd[at]cdac[dot]in

Address: AI & QT group, C-DAC, CDAC Innovation Park

Panchvati, Pashan, Pune, Maharashtra 411008

Phone No.: 020-25503423