Product Information

TEXT INDEPENDENT AUTOMATIC SPEAKER RECOGNITION USING VOICE BIOMETRIC

System for automatically identify and verify persons from their voice in controlled scenario.

Brief Description

The system consists of three main modules; Signal processing, Training and Testing. Firstly, for both training and testing purpose, the digitized speech is sent for Acoustic Feature Extraction and Voicing Detection. The feature vectors are composed of 12 lowest Mel-Frequency Cepstral Coefficients (MFCC). Then a pitch extraction procedure is used to extract MFCC features frames only for the detected voiced regions, which are then used for speaker data training and testing purposes. In between, a pitch data frequency distribution is also prepared for each speaker, within normal human voicing range (generally 80 to 420 Hz). Later, in testing, this distribution is used for Pitch Based Dynamic Pruning (PBDP) of unlikely speakers before Matching.

During training process, individual speaker models or Codebooks are created by clustering the training feature vectors into few numbers of related clusters known as Code-vectors using the well known unsupervised Vector Quantization based clustering algorithm. Then weights are assigned to all the code-vectors by using a Speaker Discriminative Weighting scheme, such that code-vectors having higher discriminating power are assigned with the larger weights and vice-versa. In testing module, firstly a list of Most Likely speakers is created by PBDP. Then final matching scores are calculated between those speakers’ weighted codebooks and the voiced MFCC frames of the test speech signal. The codebook that maximizes the similarity measure (with highest matching score) is the best matching codebook and hence is the identified Speaker.

Use Cases

Voice biometry based Office attendance
Remote vote casting via telephone calls
E-commerce (purchase of goods)
Secured access to mobiles, handhelds
Door Access Control in smart homes

Salient Features

Easy to use: speech is behavioral biometric, easily available, user friendly and less intrusive
High acceptability: low cost, less storage space, compact for small electronic devices/handhelds
Text & language Independent: no specific text, accepts any valid utterance of varying length in any language
Less interaction time: performs well with only 2 minutes of enrolment speech and 5 sec of test speech.

Technical Specifications

Application uses Voice Biometric, i.e. no need to carry keys/badges/access cards or remember passwords / PINs.
Speech is remotely accessible, so same technology can be used for remote authentication via telephone.
Method is scalable for recognizing multiple speakers or verifying same speaker across audios of different languages.

Platform Required

Softwares:

Microsoft Windows XP professional and above

For remote access

Linux Operating System
Asterisk Gateway Interface (AGI)
PHP
MySQL

Hardwares:

Standard desktops one good quality noise cancelling microphone

For remote access

ISDN-PRI / E1 Channel
Asterisk Server

Contact Details

Advance Signal Processing Group, Speech Processing Section

Mr. Joyanta Basu

Email: joyanta.basu[at]cdac[dot]in

Phone No.: 033-2357-9846, Ext: 226 (O)