ESNOLA based Bangla TTS

In the last decade there has been a significant trend for development of speech synthesizers using Concatenative based Synthesis techniques. There are a number of different methodologies for Concatenative Synthesis like TDPSOLA, PSOLA, MBROLA and Epoch Synchronous Non Over Lapping Add (ESNOLA).

Concatenation Synthesis

In concatenation synthesis, speech is generated by combining splices of pre-recorded natural speech. To take care of context-dependency and information embedded in transition segments, the splices are selected such that they begin and end with comparatively steady states.


It is a concatenative speech synthesis system which uses a new set of signal units in sub-phonemic level, namely, partneme as the smallest signal units for concatenation. The Epoch Synchronous Non Overlap Add (ESNOLA) algorithm is developed for concatenation, regeneration as well as for pitch and duration (prosodic) modification. The methodology of concatenation provides adequate processing for proper matching between different segments during concatenation. The use of special type of basic signal segments makes the size of signal dictionary very small so there is a possibility of its implementation in low-cost general-purpose electronic devices. The phoneme string output from the Text Analyzer is assigned tokens, based on the indexing of the segmented partneme voice signals. Normalization of pitch and amplitude has been done to implement the prosody and intonation. The selected segments are concatenated at epoch positions to get the raw output signal. Steady states of the nucleus vowel segment are generated by the linear interpolation with appropriate weights of the last period and the first period respectively of the preceding and the succeeding segments. The generated signals require some spectral smoothing at the point of concatenation to remove mismatch and other spectral disturbances.

Figure- Basic Block Diagram of TTS System using ESNOLA Technique

The above block diagram describes the basic part of the ESNOLA technique for the development of text-to speech synthesis system.
Based on the above technique Bangla TTS system has been developed and named as "BANGLA VAANI"

System Features:

  • Low Memory requirement.
  • Support UNICODE and ISCII input for Bangla Text.
  • Easy to integrate with other applications.
  • Supports unlimited vocabulary with text normalization.
  • Output in 16-bit PCM format with sampling frequency 22050 Hz.
  • Runs on windows OS.

Bangla Vaani


  • A Text-To-Speech for Bangla capable of synthesizing adequately intonated speech.
  • This system was used by the Election Commission of West Bengal for announcing results of 2006 Assembly poll.

CDAC, Kolkata is one of the active members of DeitY-TTS Consortium (Phase - II).

