Thursday, 4 August 2011

Voice Codecs

The studies of human voice shows that it contains frequencies from 0-7KHz. However, most of the energy of the voice is below 4KHz. So, legacy telephone systems used to apply a Low-Pass Filter (LPF) against the audio signal with cut frequency of 4KHz at the sender side. According to sampling theory, In order to be able to reconstruct the signal at the receiving side, twice the 4KHz samples are required, minimum sampling rate = 8KHz.

The sampling is done using Analog to Digital Converter (ADC). In each sample, the audio is quantified according to the signal amplitude in a range of 256 different values, i.e. each sample is represented by 8 bits. The bit rate of the raw uncompressed audio (known as Pulse Coded Modulated, PCM) = 8bits/sample x8KHz samples/second= 64Kbits/sec.

The PCM audio at that stage can be “encoded” or “compressed” using a “codec” algorithm at the sender and “decoded” or “decompressed” at the receiving end using the same “codec” algorithm (or to e precise the reverse of it!”.

There are a variety of codecs.Each is design with certain assumptions and objectives in mind. Some are free and some has royalty fees attached to them. One of the main objectives for this process is to save the amount of network bandwidth required to have a VoIP conversation. The more voice is compressed the more concurrent calls can take place given a fix bandwidth.However, a higher compression Codecs may not produce the best quality of audio if the assumptions used in its design are not met in the underlying IP network for example.

The most used Codecs in the VoIP world are G.711a, .711u, G.729. However, a punch of Codecs are famous in the VoIP world: G.722, G.723, G.726 and more.  

Silence Suppression, and Comfort Noise


It worth to mention that one way to save bandwidth is called “Silence Suppression”. Which means not to send traffic when there is silence is detected in the middle of a conversation. One other interesting concept is “Comfort Noise”. During silence suppression, the person listening could “feel” the line was disconnected and hangup. Comfort Noise is a low amplitude noise signal generated locally at the receiving end (not transmitted over a network) in VoIP phones during silence suppression times to make the person the phone feels the line is still connected. 

Silence suppression works in conjunction with “Voice Activity Detection (VAD)”. When no voice is detected, silence suppression is in effect. And when voice is detected silence suppression disables. This happens many times during the call. Now there’s a time lag between the detection and state-switching. There’s also a loudness threshold I believe the voice has to go beyond to be detected and not to be classified as background noise.

Most IP Phones allows enabling/disabling silence suppression in their Voice Codec calculations. They may not expose VAD configuration parameters though. The silence suppression on/off configurations should -in theory- show up actually during SIP negotiation, in the Session Description Protocol (SDP) body of the SIP message, in order to inform the remote systems whether this phone supports silence suppression or not:


 a=silenceSupp:off


Acoustic Echo Cancellation (AEC)

During a hands-free call, the voice from the sender ‘Alice’ is played in the reciever’s loudspeaker ‘Bob’, propagates in the air and is partially captured by the receiver’s side own microphone causing a feedback.







 If this feedback signal is transmitted back to the sender ‘Alice’, then ‘Alice’ is going to hear an echo of her voice. One of the things should  by done to the signal before transmitting in that case is subtract the feedback signal at ‘Bob’s’ phone before transmitting This feedback is actually an amplified version of ‘Alice’ voice (with some gain adjustment  or with some minor time delay added). This process is known  as “Acoustic Echo Cancellation”.

Wide-Band Codecs Versus Narrow-Band Codecs

As we mentioned above, the human voice spectrum contains frequency components up to 7KHz. By sampling at 8KHz we are losing some details of the voice. The Codec works at this assumption are known as Narrow-Band Codecs. Wide-band Codecs assumes higher cut off frequency of the voice and takes these details into consideration by sampling at 16KHz. Of course that requires more bandwidth to transmit the audio from sender to receiver  The main objectives of the Wide-band Codecs is to compress this data as efficient as possible without scarifying the quality of the audio.

Digital Signal Processor (DSP)

Codec algorithms usually requires mathematical operations to be computed. DSP is a microprocessor designed in such a way to serve the needs of digital signal processing computations. Softphone implements the “Codec” algorithm in software without any DSP assistance. However, IP Phone terminals, usually contains a DSP chip (or DSP embedded in a bigger chip) to carry on the ”Codec” algorithm in a more efficient way than general purpose CPUs we find in our laptops or mobile phones. These DSP runs programs implements the digital signal processing concepts. For more information about Signal Processing Concepts, refer to this bookDigital Signal Processing (4th Edition)

No comments:

Post a Comment