OPTICAL CHARACTER RECOGNITION SYSTEM FOR INDIAN SCRIPTS

Brief Description

OCR system for the printed Indian Scripts converts scanned document images into editable Unicode text format. The system has been developed for Hindi, Bangla, Tamil, Gurumukhi, Malayalam and Odia Script. OCR system bundled with pre-processing and post-processing algorithms to provide end-to-end solution. Solution is having a hybrid approach, designed to work with the platform and technology independent modules.
Product for the end-to-end solution available to work on Windows (Windows 7) and Linux (Ubunto, Fedora) based operating system. The web based solution is also developed for the OCR services.

Main Uses and Domain

  1. Conversion of Old archived document images into text for information retrieval and indexing for Heritage Computing.
  2. Generation of Unicode text from text books images and to make the content ready for the other processing like TTS, Machine Translation etc.
  3. Data Entry automation
  4. ebook generation

Features and Technical Specifications

  1. Basic image enhancement and editing tools (cropping, rotation, zoom in/zoom out, orientation, binarization, noise removal etc.
  2. Works on Grey level and black 'n' white images
  3. Support for BMP and PNG formats.
  4. Running individual modules successively for obtaining final OCR output.
  5. End-to-End OCR
  6. More than 90% accuracy at character level
  7. Best performance when scanned at 300 dpi
  8. Text editing tool coupled with dictionary
  9. Output in Unicode supported format

Platform required(if any)

  1. Desktop Version:Windows 7 and above
  2. Desktop version for Linux(Ubuntu, Fedora) Operating System
  3. Web Based

Read more     Download Brochure

Contact Details for Techno Commercial Information

  1. Mr. Tushar Patnaik
    Principal Technical Officer,SNLP Lab, CDAC-Noida
    Email ID: tusharpatnaik[at]cdac[dot]in
  1. Prof. Santanu Chaudhury
    Director, CEERI PILANI
    Email ID: director[at]ceeri[dot]ernet[dot]in