Sunday, 21 August 2011

IP Phones, What is under the hood? (Part 2)


The second component on an IP Phone as the firmware. The firmware is considered the "secret sauce" to the vendor. The more reliable, feature-rich , easy to configure and easy to troubleshoot, the wider the phone's customer base will be. 

Under the hood, the firmware consists of the
  •   Embedded Real-Time Operating System
  •   Phone Application(s)

Embedded Real-Time Operating System (RTOS)
Users maybe don't really care what RTOS is running on their phones, however engineers are probably more interested in such a topic. When an embedded system designer (an IP Phone is not exception) is trying to select a operating system for his system under design, there are a lot of factors may be considered:
  • How this system is going to perform in a real-time telecommunication device such as an IP Phone? What features does its scheduler support?
  • How does it handle interrupts?
  • How much work is it to customize the OS to my hardware design?
  • Are the device drivers for each chip of (mentioned in part 1) available or do I have to develop it in house?
  • What file system does it support? How reliable is it through power failure? Do I need to integrate a special file system?
  • Are the development and debugging tools available for this OS? Is it supported by the vendor and maybe other communities?
  • What are the licensing agreement to purchase these development tools?
  • What are the royalty fees to paid when the product is commercialized? Is it annual fee or cost per each shipped unit? How they keep track of these shipped units?
  • Does this OS support Graphical User Interface (GUI)? If not, what are available third part GUI engines available out there? What are the licensing schema?
Many many technical and business factors to be considered. During my engineering life I found also it is not just the OS that matters, but also the vendor behind it. In the rush for shorted time-to-market competitive  environments, a good reliable OS from a good reliable vendor with good reliable engineering team, makes a big difference.

It will be unfair to say there are only one good vendor in the market and that's XYZ, because some are really good. From my own personal experience, Windriver Systems with their both VxWorks and Embedded Linux are real good RTOS provider.

Linux is probably the most growing embedded OS. Firmware engineers would like to learn how the Linux kernel really work. So they can write device drivers for the different hardware components of the system (the IP Phone  in our case)., and to understand the environment the application part for the firmware will interact with, thus, enhance their designs.

To learn more about Linux Device Drivers, check this book out: Linux Device Drivers, 3rd Edition
To learn more about Linux Kernel, check this book out: Understanding the Linux Kernel, Third Edition

TO BE CONTINUED...

Monday, 8 August 2011

IP Phones, What is under the hood? (Part 1)


In previous posts, we discussed the concepts behind VoIP and the underlying communication protocols, such as SIP, RIP, TCP/IP, …etc, I am going on this post to discus the internal design of an IP phone, and how it affects the performance of the phone.

The IP Phone, in general, is an “Embedded System”. Embedded systems (IP Phone is no exception), have mainly 3 major components:
  1. Custom-Designed Circuit (a.k.a PCB).
  2. Custom-Designed Software (a.k.a “Firmware”)
  3. Custom-Designed Chassis (a.k.a. Industrial Design) 
Custom-Designed Hardware
IP Phone is a microprocessor-based system, similar –in a way- to a PC. It has a Microprocessor CPU, RAM, Flash Memory (for permanent storage, similar to your PC hard disk), Ethernet controller (with RJ-45) the hardware design It also has telephone-related components such as a keypad, handset, hands free microphone. A phone may /may not have a display (LCD), Digital Signal Processor (DSP),

Click on the diagram for larger image


The reason we call the design of the hardware “custom”, is that each IP Phone may have a different layout internally that another phone, even if it is made by the same vendor. In comparison to the PC world, IP Phone users can not simply grab an application built for MIPS32 or ARM and simply run this application on their IP Phones. Although the application was built for that target CPU, the interconnections between the CPU and the rest of the chips in the system varies a lot from a phone to the other and unfortunately there’s no standards to follow from that prospective, comparing to traditional PCs. An exception for this rule would be a Java Application (we will discuss Java Apps later on in this blog).


CPU
Most phones are running on a RISC microprocessor. Usually it is ARM or MIPS32. One of the interesting things to look at in an IP Phone specification sheet, in addition to whether it is running ARM or MIPS32 CPU, is what clock speed is it running on.  The higher the CPU clock is faster it is and the more energy as well it consumes.


DSP
DSP microprocessor main purpose is to offloads the mathematical work load, of Codec computation, Acoustic Echo Cancelation, Equalization (illustrated in the post: Voice Codecs), instead being performed on the main CPU. That design devotes the DSP resources to ensure better quality for the voice. Believe it or not, some IP Phones designer may choose not to have DSP hardware in their phones! Therefore, this work has to be done by the main CPU. Since the CPU is a shared resource in a multitasking environment, the firmware must be written in a very careful way to ensure voice processing gets enough CPU resources when needed, or the voice quality could be at stake. I would personally prefer to buy a Phone with DSP in it.

Ethernet (LAN and Pass-Thru Ports and Wi-Fi)
Ethernet on the IP Phone may take different forms. A desk IP phone will have RJ45 interface with 10/100 or maybe 1Gbps wired interface. A portable office phone will have a Wi-Fi Wireless chip in it. Ethernet/Wi-Fi needs a controller to transmit/receive Ethernet frames. This is the interface to the outside world that communicates SIP (signaling) packets, RTP (filled in with encoded voice) packets and probably other types of traffic. This is also where the IT administrator reaches the phone remotely for configuration and maintenance and technical support.
Pass-Thru Port main function is to allow the user to connect his laptop/PC to the phone and use it as a “switch” to connect to the rest of the LAN. If phone doesn’t have that pass-thru port, 2 ports on the networks will be required: one for the IP Phone and one for the PC/laptop. Having it saves ports on the LAN switch and cabling as well.


Display
A display varies between monocolor LCD, with 3 lines of text display, it could have a larger multilien text display with some symbols on it, or it could have a grey-scale display that supports multiple grey colors from black to white, a nice VGA resoluion color display or even a high-definition resolution LCDs. Regardless the size of the LCD, its color resolution, it needs a “Display Chip” that sits between the CPU and the actual LCD panel.  


RAM & FLASH
RAM is run-time volatile storage for programs. In embedded sytem world, usually the concept of page-swapping does not exist. Page-swapping means when the software tasks on a microprcessor at a given time needs more memory that the size of the actual physical RAM on the system, the operating system memory manager can create the sense that a larger memory exists on the system for the applications to use, by taking some space on the flash memory (harddrive in PC world) and moving data pages (a page is 4Kbyte) from physical memory to the disk space back and forth. As I mentioned this features is nice to have on general purpose computing systems but not in an Embedded System.

That means the more memory the phone have, the more space it has for applications to work. If I am the IT manager or a business owner and I have ambitious plans to run an attendance application on top of the IP Phone for employees or news feeds right at their phoen screens, …etc.(most phone provide these capapilites today), I must be check how much RAM this phone will have.
Flash memory in embedded system device stores a firmware image the phone needs to load and run when the phone is turned off and back on. Firmware like any other software may change after the phone is sold to the customers. These changes may contain new features, defect fixes and even necessary security fixes that of an interest to the customer. If the flash size of the phone is small, the phone owner maybe stuck at a situation they can not upgrade the firmware due to the fact that the flash size can not store a firmware of a bigger size.

Also, the phone flash may store configuration data required for operation. It may also be used to save Call History (Missed, Answered, Dialed calls). It may also store the Phone Book or Employees Directory of a company, so that the user can lookup the phones numbers by name and dial the number directly withotu memorizing too many numbers. Bottom line, check the size of a flash memory. There’s also a relation between RAM and Flash memory sizes. Usually a flash memory is ½ the RAM size in most of the phones I have seen, but some may have flash size is ¼ the RAM size.

To be continued...

Thursday, 4 August 2011

Voice Codecs

The studies of human voice shows that it contains frequencies from 0-7KHz. However, most of the energy of the voice is below 4KHz. So, legacy telephone systems used to apply a Low-Pass Filter (LPF) against the audio signal with cut frequency of 4KHz at the sender side. According to sampling theory, In order to be able to reconstruct the signal at the receiving side, twice the 4KHz samples are required, minimum sampling rate = 8KHz.

The sampling is done using Analog to Digital Converter (ADC). In each sample, the audio is quantified according to the signal amplitude in a range of 256 different values, i.e. each sample is represented by 8 bits. The bit rate of the raw uncompressed audio (known as Pulse Coded Modulated, PCM) = 8bits/sample x8KHz samples/second= 64Kbits/sec.

The PCM audio at that stage can be “encoded” or “compressed” using a “codec” algorithm at the sender and “decoded” or “decompressed” at the receiving end using the same “codec” algorithm (or to e precise the reverse of it!”.

There are a variety of codecs.Each is design with certain assumptions and objectives in mind. Some are free and some has royalty fees attached to them. One of the main objectives for this process is to save the amount of network bandwidth required to have a VoIP conversation. The more voice is compressed the more concurrent calls can take place given a fix bandwidth.However, a higher compression Codecs may not produce the best quality of audio if the assumptions used in its design are not met in the underlying IP network for example.

The most used Codecs in the VoIP world are G.711a, .711u, G.729. However, a punch of Codecs are famous in the VoIP world: G.722, G.723, G.726 and more.  

Silence Suppression, and Comfort Noise


It worth to mention that one way to save bandwidth is called “Silence Suppression”. Which means not to send traffic when there is silence is detected in the middle of a conversation. One other interesting concept is “Comfort Noise”. During silence suppression, the person listening could “feel” the line was disconnected and hangup. Comfort Noise is a low amplitude noise signal generated locally at the receiving end (not transmitted over a network) in VoIP phones during silence suppression times to make the person the phone feels the line is still connected. 

Silence suppression works in conjunction with “Voice Activity Detection (VAD)”. When no voice is detected, silence suppression is in effect. And when voice is detected silence suppression disables. This happens many times during the call. Now there’s a time lag between the detection and state-switching. There’s also a loudness threshold I believe the voice has to go beyond to be detected and not to be classified as background noise.

Most IP Phones allows enabling/disabling silence suppression in their Voice Codec calculations. They may not expose VAD configuration parameters though. The silence suppression on/off configurations should -in theory- show up actually during SIP negotiation, in the Session Description Protocol (SDP) body of the SIP message, in order to inform the remote systems whether this phone supports silence suppression or not:


 a=silenceSupp:off


Acoustic Echo Cancellation (AEC)

During a hands-free call, the voice from the sender ‘Alice’ is played in the reciever’s loudspeaker ‘Bob’, propagates in the air and is partially captured by the receiver’s side own microphone causing a feedback.







 If this feedback signal is transmitted back to the sender ‘Alice’, then ‘Alice’ is going to hear an echo of her voice. One of the things should  by done to the signal before transmitting in that case is subtract the feedback signal at ‘Bob’s’ phone before transmitting This feedback is actually an amplified version of ‘Alice’ voice (with some gain adjustment  or with some minor time delay added). This process is known  as “Acoustic Echo Cancellation”.

Wide-Band Codecs Versus Narrow-Band Codecs

As we mentioned above, the human voice spectrum contains frequency components up to 7KHz. By sampling at 8KHz we are losing some details of the voice. The Codec works at this assumption are known as Narrow-Band Codecs. Wide-band Codecs assumes higher cut off frequency of the voice and takes these details into consideration by sampling at 16KHz. Of course that requires more bandwidth to transmit the audio from sender to receiver  The main objectives of the Wide-band Codecs is to compress this data as efficient as possible without scarifying the quality of the audio.

Digital Signal Processor (DSP)

Codec algorithms usually requires mathematical operations to be computed. DSP is a microprocessor designed in such a way to serve the needs of digital signal processing computations. Softphone implements the “Codec” algorithm in software without any DSP assistance. However, IP Phone terminals, usually contains a DSP chip (or DSP embedded in a bigger chip) to carry on the ”Codec” algorithm in a more efficient way than general purpose CPUs we find in our laptops or mobile phones. These DSP runs programs implements the digital signal processing concepts. For more information about Signal Processing Concepts, refer to this bookDigital Signal Processing (4th Edition)

Monday, 1 August 2011


Session Initiation Protocol (SIP)


Session Initiation Protocol (SIP) is used in VoIP networks for signaling. Signaling in a simple form is the control messages such as:  dialing, ringing, busy, call established, hang-up, transfer, hold, unhold…etc. It also allows the two remote systems trying to establish a phone call to agree on how to format the voice and where to send it. It is a text-based protocol. And it is defined by IETF in a number of RFCs.

SIP starts by a Registration Process, where the phone (i.e. IP-Phone, ATA adapter, softphone) sends a registration request to the SIP server (i.e. IP-PBX). In this request it has the user`s telephone number, IP address and port to be reached at if there`s an incoming call coming for that user.


Figure 1: SIP Registration Process

For authentication, the server challenges the user to ensure its identity, the user-agent replies back with another registration message that contains challenge-response to identify itself. An attacker watching the messages should not be able to analyze the challenge and its response and discover the user’s password. If he/she managed to do so for whatever reason, the attacker can receive all his/her phone calls.


In a simple call scenario, where a user is trying to place a call, the caller's phone sends a SIP INVITE message to the server. The server again will authenticate the identity of the user, then it will forward the request to the remote phone. The remote phone  rings and sends back a ringing message. The caller's phone starts to ring-back to give the caller an indication that he/she should wait for an answer.



Figure 2: Call Flow using SIP protocol

When the remote user answers the call, a 200 OK status message shall be sent back to the callers phone, via the server in the middle. Where the caller's phone acknowledges it by sending a handshake SIP ACK message. After that point, RTP protocol starts to  flow directly between the phones (not involved the server) carrying the voice (and possibly video if the both phones are advanced videophones with camera and display).

In the INVITE message the callers phone tells the remote phone where to send the RTP (IP address, protocol and port) and a lsit of what are the possible codec's the phone can understand. In the 200 OK status message the callee phone send where he should receive the RTP (IP address, protocol, port). It also sends a list of voice "codec" it can accept. The 2 phones agree on the first codec the both understand. That process is called "Codec Negotiation". The beauty of that process both phones, that could be manufactured by different vendors, had zero knowledge about each other, yet they could agree how to format the audio. If negotiation fails, the call fails and the 2 phones can not communicate.

To study more about SIP protocol, refer to this book: :  SIP: Understanding the Session Initiation Protocol (Artech House Telecommunications)


Real Time Protocol (RTP)

RTP protocol is the protocol used to transfer the “voice” or “video” stream from one a phone to the other and vice versa. The voice itself is represented using some codec. The term codec is a combination of “compression-decompression” and sometimes “coder-decoder”. This implies that voice could be compressed at the sender`s side and decompressed at the receiver`s side mainly to reduce the bandwidth required over the network for each call.
Figure 3: RTP packet header format

For more information; check this book out: RTP: Audio and Video for the Internet
  

Both SIP and RTP protocols, from the Internet Protocol Layering Model, are considered “Application Protocols”. They usually encapsulated inside UDP (often),  TCP or SCTP (sometimes) over IP. To understand the TCP/IP protocol suite in depth, refer to one of these 2 books:

Introduction to Voice over Internet Protocol (VoIP)


Voice over Internet Protocol (VoIP) is a technology that allows voice (and possibly video) communication over IP networks. These IP networks could be public, such as Internet, or private such as Enterprise networks, or a combination of both. This technology offers some advantages over the legacy telephone networks; however it also has some disadvantages. Engineers around the world are working hard to solve, and fortunately they are making lots of progress.  It is expected that legacy telephone systems shall disappear over the years.

In order to understand the advantages of VoIP systems over the legacy telephone networks, let us assume a scenario of a company that has an office with few hundred employees. Each employee has a LAN connection (usually Ethernet RJ45) as well as a telephone connection (usually RJ11). The IT room has all these LAN cables connected to a switch and the telephone cables connected to a PBX system. The PBX is also connected to external telephone lines (Known as FXO) providing public service telephone network (PSTN) service via a local carrier.

The PBX could have N lines for M users to share. Examples 60 lines for 300 users. This is based on the fact that not all employees are communicating to external numbers simultaneously and they probably need to communicate with each other more than externally. For an
employee to make a call he dials “9,desintation phone number”.

The main disadvantage of this setup, it contains extra cabling to each user. If you are in IT, you know that is an extra headache to lay and maintain these extra cables across the building. Also to add a new batch of users to the PBX extra hardware upgrade may be required (known as FXS cards).

If the same company has one or more extra branches in a remote location or even overseas, the local carrier shall charge the company long distance or overseas fees and the company cannot get around that unless they rent their own circuit from branch A to B.


In the VoIP world, the phone at the user-end is one of 3 types:
  • IP Phone
  • ATA device
  • Soft phone

IP phone is a piece of hardware, it looks like normal phones with the exception it connects via RJ45. Internally these phone sets have a different design than traditional phone set. I will illustrate the internals of these phones in my next article in a bit of details. Examples of these phones are:
  • Cisco
  • Aastra
  • Polycom
  • Grandstream


An ATA device is a device that can be connected to any legacy telephone set and it converts the output into VoIP and has a RJ45 jack on it. An example of ATA devices is Cisco PAP2T-NA, Internet Phone Adapter with 2 VoIP Ports

Softphones are piece of software that runs on your desktop or laptop and uses the microphone, speaker as well as the Ethernet card on your PC to emulate a phone set that hooks. An example of softphone is Xlite.

IP-PBX

In the IT room, there is a IP-PBX. IP-PBX Is the backend system that facliates phone calls between extensions or to the outside world. It also handles incoming calls, voicemail, call transfer, conferencing, Interactive Voice Response, Automatic All Distribution (ACD), Busy Light Field (BLF) …etc.

If the PSTN FXO lines are coming from the local carrier in the traditional telephone service form, a hardware system is required to convert the signal on the FXO lines to their VoIP equivalent.

One of the most famous open-source IP-PBX is Asterisk. It is free software that works on general purpose Intel-based PC or servers. There’s also a packaged flavor of Asterisk that has a Web User Interface for easy administration, known as Trixbox. Asterisk and Trixbox offer all small-medium enterprise PBX features: Extension-2-Extension, Call Transfer, Call Conference, Dial-plans, Voicemail, SIP Trunks …etc. They work with a number of hardware cards to connect to the telephone service providers (FXO lines). Cards are made by Digium (the creator of Asterisk), Sangoma, and probably many others. If the lines are coming in a bundles such as T1`s links (24 telephone lines) or E1`s (30 Telephone lines), a more sophisticated hardware is required, known as a Media Gateway. To learn more about how to setup Asterisk free PBX, refer to this book: Asterisk: The Definitive Guide

A Media gateway can do also other functions in a VoIP network. It can multiplex audio and/or video streams, allowing conference calls. The job of multiplexing audio / video streams in real-time requires mathematical computational power, may exceed the capabilities of normal general purpose hardware servers. That is especially true when the number of conference participants grows high. Media gateways rely on different technology of processors, known as digital signal processors (DSP) that has the computational power to perform such a job.

Networking Protocols

VoIP, nowadays, relies on bunch of networking protocols combined together to provide the voice communication service.

VoIP Advantages

  • Low rates for long distance and international calls
  • You can make and receive phone calls wherever there is a broadband connection
  • Free Caller ID, voicemail, call forward, conferencing are some of the many services included with most VoIP service providers.
  • Less Cabling for Enterprises
  • Most new IP-Phones can run XML Applications. Enterprises can use that to provide services to their employees.

VoIP Challenges

  • Emergency calls: When an emergency call is made from a legacy PSTN, the location of the call is associated with the callers due to the nature of the fixed land line. Because VoIP is portable to any place in the world, there is no guarantte that a person calling emergency is in the actual location he is claiming or registered at the service provider records.
  • Sound Quality And Reliability: VoIP may suffer from low sound quality and low reliability when public Internet is involved in delivering the voice traffic. High delays, jitter and packet loss due to congestion in the network will cause noticeable degradation the service quality.
  • Security: Since VoIP uses IP networks, it inherits all the security weakness of an IP network.
  • Power Failures: Legacy PSTN switches have batteries to overcome power outage for few hours, so that the service continues uninterrupted during power outages. Since VoIP calls relies on many systems: IP-Phone or ATA adapters, routers, switches, modems, servers more systems have to be power failure resilient. If all these systems are under one administration, it could be achievable, but that’s not always the case.