Tata Elxsi’s speech team has immense experience in developing and optimizing speech compression in various DSPs. The team has worked with many telecom companies on medium to large size projects. Our codecs are successfully running on several commercially available products in the market.
Tata Elxsi has expertise in optimizing and porting following speech codecs to various Processors such as ARM7, ARM9, TI DSPs, MIPS & Intel Processors. Each codec is very well optimized and fully complaint with the ITUT & 3GPP standards and have been tested for multi instance & memory leak.
|
| |
|
| |
| Tata Elxsi has developed the following speech processing components. These components have been tested for quality and been ported to ARM & TI DSPs. |
| |
|
 |
G.711
G.711 uses two compression algorithms, A-law and Mu-law, which use non-linear quantization of speech samples. The 16 bit PCM samples sampled at 8KHz are compressed to bit rate of 64 kbps.
Technical Specification
|
- 8 KHz sampling frequency
- Bitrate of 64 kbps
- A-law or Mu-law compression algorithm
- Compliant with ITU-T G.711 specification
- Optimized implementation
|
| |
| Applications |
- Wireless systems
- VoIP applications
- Telephone networks
|
|
 |
G.722
G.722 standard use Sub Band Adaptive Differential Pulse Code Modulation (SB-ADPCM) coding technique. It provides 7 KHz wideband audio at data rates 48, 56 and 64 kbps. The frequency band is split into higher and lower bands, and each band is encoded using ADPCM. |
| |
| Technical Specification |
- Sampling frequency of 16 KHz
- Bitrates 48.56.64 kbps
- Bandwidth of 7 KHz
- Optimized implementation
- Compliant with ITU-T standard
|
| |
| Applications |
- Wideband IP telephony
- Audio Conferencing
|
 |
 |
G.723.1
The G.723.1 is a dual rate speech codec with 16-bit linear PCM input/output. The encoder is based on the principles of linear prediction analysis-by-synthesis coding and attempts to minimize a perceptually weighted error signal. The encoder operates on blocks (frames) of 240 samples each. That is equal to 30 msec at an 8 kHz sampling rate. For the high bit rate, Multi-pulse Maximum Likelihood Quantization (MP-MLQ) excitation is used, and for the low bit rate, an algebraic-code-excitation (A-CELP) is used. |
| |
| Technical Specification |
- Sampling frequency of 8 KHz
- Supports bit rates 6.3 kbps and 5.3 kbps
- Fixed frame size of 30 ms
- Optimized Implementation
|
| |
| Applications |
- Voice over Internet (VoIP) applications
- Wireless systems
- Wideband telephony
- Audio conferencing
- Multimedia services at low bitrate
|
 |
 |
G.726
G.726 codec converts 64 kbps A-law or Mu-law channel to 16/24/32/40 kbps. It uses Adaptive Differential Pulse Code Modulation (ADPCM) method. The sampling frequency used is 8 KHz. |
| |
| Technical Specification |
- Technical Specification
- Sampling frequency of 8 KHz
- Bit rates 16/24/32/40 kbps
- Optimized implementation
- Compliant to ITU-T standard
|
| |
| Applications |
- VoIP systems
- Communication systems
|
 |
 |
G.729AB
G.729AB codec encode speech signals at 8 Kbit/s using Conjugate-Structure Algebraic-Code-Excited Linear-Prediction (CS-ACELP). 16-bit linear PCM speech signal sampled at 8 KHz are processed. An Encoder analyzes the speech signal to extract the parameters of the CELP model. These parameters are encoded and transmitted in a bit stream. The decoder for this system uses the received parameters to retrieve the synthesis filter coefficients. The speech is then reconstructed by filtering by an excitation codebook. This codec operates on 10 ms frames with 5 ms look-ahead for linear-prediction (LP) analysis. Hence, the overall algorithmic delay is 15 ms. Annex B provides VAD/CNG/DTX algorithm that helps to reduce the transmission rates, during the silence periods of the speech.
|
| |
| Technical Specification |
- Sampling frequency of 8 KHz
- CS-ACELP speech coding
- Bit rate of 8 kbps
- DTX/VAD/CNG support
- Compliant with ITU-T standard
|
| |
| Applications |
- Voice over Internet (VoIP) applications
- Wireless communications
- Satellite communication systems
|
 |
 |
AMR-NB
The Adaptive Multi-Rate Narrow-Band (AMR-NB) speech codec consists of the multi-rate speech coder, a source controlled rate scheme including a voice activity detector (VAD) and a comfort noise generation (CNG) system, and an error concealment mechanism to combat the effects of transmission errors and lost packets. The multi-rate speech coder is a single integrated speech codec with eight source rates from 4.75 kbit/s to 12.2 kbit/s, and a low rate background noise encoding mode. The speech coder is capable of switching its bit-rate every 20 ms speech frame upon command. The speech codec takes its input as a 13-bit uniform Pulse Code Modulated (PCM) signal either from the audio part of the UE or on the network side, from the Public Switched Telephone Network (PSTN) via an 8-bit A-law or µ-law to 13-bit uniform PCM conversion. The encoded speech at the output of the speech encoder is packetized and delivered to the network interface. In the receive direction, the inverse operation takes place. The coding scheme is Multi-Rate Algebraic Code Excited Linear Prediction AMR, which is now widely used in GSM and UMTS. It uses link adaptation to select from one of eight different bit rates based on link conditions.
|
| |
| Technical Specification |
- Sampling frequency of 8 KHz
- Bitrate from 4.75 to 12.2 kbps
- VAD/CNG/DTX support
- Fully compatible with GSM AMR standards
- Optimized implementation
|
| |
| Applications |
- VoIP systems
- Mobile telephony
- Streaming Media Servers
|
 |
 |
AMR-WB
Adaptive Multi Rate Wide Band (AMR-WB) is the standard using the ACELP coding method. The codec supports wideband speech of 50 – 7000 Hz, which improves the speech quality. AMR-WB support 9 bit rates from 6.6 to 23.85 kbps. Codec use 16 bit PCM samples at the sampling frequency of 16 KHz, processing 20ms frames. It also supports Voice Activity Detection (VAD).
|
| |
| Technical Specification |
- Sampling frequency of 16 KHz
- Bit rates 6.6 to 23.85 kbps
- Linear PCM output
- Voice Activity Detection
- IF1, IF2 format support
- Conformance to 3GPP standard
- Support 3gp file format
- Optimized implementation
|
| |
| Applications |
- Mobile communications
- ISDN wideband telephony
- VoIP applications
- Video conferencing
|
 |
 |
AMR-WB+
Extended Adaptive Multi Rate Wide Band (AMR-WB+) is an extension to the AMR-WB codec and is designed to compress speech and audio signals at low bit rate and good quality. The standard is specified by the 3GPP. AMR-WB+ supports mono rates upto 36 kbps and stereo rates upto 48 kbps. 3GPP originally developed the AMR-WB+ audio codec for streaming and messaging services in GSM and 3G systems.
|
| |
| Technical Specification |
- Supports speech and audio coding
- Sampling frequency of 16 to 48 KHz
- Bit rate support from 6.6.kbps to 23.85 kbps for speech, upto 36 kbps for mono and upto 48 kbps for stereo
- Support 3gp file format
- Optimized implementation
|
| |
| Applications |
- Packet Switched Streaming services
- Multimedia Messaging Services
- Multimedia Broadcast and Multicast Service
|
 |
 |
iLBC
Internet Low Bitrate Codec (iLBC) is a narrowband speech codec suitable for VoIP applications. It uses block-independent linear predictive coding algorithm. It supports frame size of 20 ms and 30 ms, with bit rates of 15.2 and 13.33 kbps respectively. iLBC handles the lost frames through graceful speech quality degradation, hence useful for robust voice communication over IP. |
| |
| Technical Specification |
- 8 KHz sampling rate
- Supports 20 and 30 msec frame size
- Fixed bit rate of 15.2 kbps for 20 ms frames and 13.33 kbps for 30 ms frames
- Robust over packet loss
- Optimized implementation
- Compliant with IETF specification as per RFC3951
|
| |
| Applications |
- VoIP applications
- Streaming Audio
|
 |
 |
AGC
Tata Elxsi offers Automatic Gain Control algorithm, which can be used in different speech applications. Based on the input energy (signal) level, the AGC algorithm calculates gain at variable rates to maintain a constant output level. AGC gives weaker signals more gain and stronger signals either less gain or none at all.
|
| |
| Technical Specification |
- Supports 8kHz, 16kHz and 32kHz sampling rates
- Configurable output level & reference level
- Configurable gain settings
- Multi channel support
- Requires VAD information
- Low memory requirements
- C Callable APIs
|
| |
| Applications |
- VoIP systems
- Digital Telephony
|
 |
 |
Mixer
Audio Conference systems use speech mixers to mix the audio streams received from one or more channels participating in the conference. TEL Mixer is an optimized version and supports mixing multiple channels.
|
| |
| Technical Specification |
- Configurable Output gain and Channel Gain
- C callable API for initialization & Mixing
- Multi Channel, Reentrant implementation
- Optimized implementation
|
| |
| Applications |
- Audio Conferencing Systems
|
 |
 |
DTMF
DTMF (Dual Tone Multiple Frequency) signaling system is used in touch-tone dialing. Each digit consists of one low frequency and high frequency. The frequency pair is formed by one frequency from a "low" group - 697Hz, 770Hz, 852Hz, 941Hz - and the other from a "high" group - 1209Hz, 1336Hz, 1477Hz, 1633Hz. DTMF receiver detects the presence of these tones and DTMF remover removes the DTMF tone and replaces it with either silence or white noise.
|
| |
| Technical Specification |
- Supports DTMF detection and removal
- Conformance to ITU Q.23 and Q.24 standards
- Configurable DTMF minimum frame length
- Configurable LEVEL & TWIST values
- Multi Channel, Reentrant implementation
- C callable API for initialization, Tone Detection and Tone Removal
- Optimized implementation
|
| |
| Applications |
- Voice Mails, Voice Response Systems
- Remote control of computer and telephone equipment
|
 |
 |
VAD
VAD (Voice Activity Detection) are used to reduce the transmission rate during inactive speech periods while maintaining an acceptable level of output quality. VAD classifies the input signal into active speech or inactive speech. |
| |
| Technical Specification |
- Smoothing algorithm to avoid clipping of speech at active to inactive speech transition regions
- Adapts to changing background noise
- Reentrant, multi-channel implementation
- Configurable frame size
- Optimized Algorithm.
- C callable APIs
|
| |
| Applications |
- Digital telephony
- Digital Telephone Answering Machines
|
 |
 |
Packet Loss Concealment (PLC)
Packet Loss Concealment is used to recover the lost frames/packets in digital telephony systems. It is mainly helpful in avoiding artifacts & glitches due to packet loss. Tata Elxsi’s Packet Loss Concealment algorithm is based on ITU-T standard G.711 Appendix I and it operates on the PCM data. |
| |
| Technical Specification |
- Low Complexity Concealment algorithm based
on ITU-T G711 Appendix I
- Improved Quality
- Re-entrant, multi-channel implementation
- C Callable APIs for PLC initialization and PLC
|
| |
| Applications |
|
 |