Chitrankan OCR for Indian languages
In our present knowledge-based society, digital information plays a key role. For information to be easily accessible, it must be converted into a digital form. Over time we are saddled with vast volumes of legacy data and newly created records in physical format. It becomes almost impossible to retrieve vast pieces of information quickly from physical records. The OCR (Optical Character Recognition) technology is the way forward to make this happen.
Our vast treasure troves of literary heritage comprising printed books, manuscripts, documents, etc. are becoming impossible to protect and preserve. Over the years these documents become so fragile that we can’t even hold them properly and read the contents. To maintain and preserve the information in old documents such as newspaper cuttings, and critical academic and technical works the solution lies in digitization.
Even digitally created documents (printed newspapers, magazines, corporate reports, etc.) need to be calibrated for data analytics applications to solve various problems. Modern OCR techniques make it possible to preserve information and make offices ‘paper-free’ and information easily searchable.
Chitrankan (Powered by AI) will help data entry be fast with less human effort with the use of Optical Character Recognition (OCR) technology and provide machine translation support after validation too. OCR converts document images into digital text by automatic text line detection and recognition. This OCR web application can be accessed by visiting https://chitrankan.ebhasha.in/. Simply click the link or scan the QR code given below to go directly to the site.
Use Cases
- Office automation
- Archival of text matters
- Business card reader
- Data entry
- E-book generation
- Searchable menu
- Signboards
- Number plates
Domain:
- Banking/Legal/Healthcare/Education/Finance/Government agencies
Salient Features
- Input image formats:
PDF, BMP, JPG, JPEG, PNG, TIFF - Output formats:
Allows users to export recognized Unicode text into various output formats like TXT, UTF-8.
* .docx support is available in the intranet version for layout preservation. - Language supported:
Bangla, English, Gujarati, Gurumukhi, Hindi, Kannada, Malayalam, Marathi, Oriya, Tamil, Telugu, Urdu. - Heritage languages:
Marwari and Modi.
*Language support for Sanskrit, and other heritage or low-resource language (Bhili) coming soon ... - On-screen Inscript Keyboard is provided for inputting and editing.
- Handling document complexities
Colour, gray, skewed, scanned, single-column, multicolumn document images.
*Camera-captured, illuminated, dewarping, and perspective correction support might come in the future. - Machine translation:
After validation of OCR output machine translation can be used to translate documents - Add-on components (optional):
Phonetic keyboard with Prediction support
Spellchecker
Technical Specifications
- Available as a web service.
Platform Required (if any)
- This solution is deployable over Local Servers / Data Centers.
- Chitrankan OCR can be made available as cloud services on-demand.
Contact Details for Techno Commercial Information
GIST Group
Email:
info.gist@cdac.in for information on GIST products
sales.gist@cdac.in for sales related information
support.gist@cdac.in for support related information
Address: GIST group, C-DAC, CDAC Innovation Park
Panchvati, Pashan, Pune, Maharashtra 411008
Phone No.: 020-25503475