Monday, 1 August 2011


Session Initiation Protocol (SIP)


Session Initiation Protocol (SIP) is used in VoIP networks for signaling. Signaling in a simple form is the control messages such as:  dialing, ringing, busy, call established, hang-up, transfer, hold, unhold…etc. It also allows the two remote systems trying to establish a phone call to agree on how to format the voice and where to send it. It is a text-based protocol. And it is defined by IETF in a number of RFCs.

SIP starts by a Registration Process, where the phone (i.e. IP-Phone, ATA adapter, softphone) sends a registration request to the SIP server (i.e. IP-PBX). In this request it has the user`s telephone number, IP address and port to be reached at if there`s an incoming call coming for that user.


Figure 1: SIP Registration Process

For authentication, the server challenges the user to ensure its identity, the user-agent replies back with another registration message that contains challenge-response to identify itself. An attacker watching the messages should not be able to analyze the challenge and its response and discover the user’s password. If he/she managed to do so for whatever reason, the attacker can receive all his/her phone calls.


In a simple call scenario, where a user is trying to place a call, the caller's phone sends a SIP INVITE message to the server. The server again will authenticate the identity of the user, then it will forward the request to the remote phone. The remote phone  rings and sends back a ringing message. The caller's phone starts to ring-back to give the caller an indication that he/she should wait for an answer.



Figure 2: Call Flow using SIP protocol

When the remote user answers the call, a 200 OK status message shall be sent back to the callers phone, via the server in the middle. Where the caller's phone acknowledges it by sending a handshake SIP ACK message. After that point, RTP protocol starts to  flow directly between the phones (not involved the server) carrying the voice (and possibly video if the both phones are advanced videophones with camera and display).

In the INVITE message the callers phone tells the remote phone where to send the RTP (IP address, protocol and port) and a lsit of what are the possible codec's the phone can understand. In the 200 OK status message the callee phone send where he should receive the RTP (IP address, protocol, port). It also sends a list of voice "codec" it can accept. The 2 phones agree on the first codec the both understand. That process is called "Codec Negotiation". The beauty of that process both phones, that could be manufactured by different vendors, had zero knowledge about each other, yet they could agree how to format the audio. If negotiation fails, the call fails and the 2 phones can not communicate.

To study more about SIP protocol, refer to this book: :  SIP: Understanding the Session Initiation Protocol (Artech House Telecommunications)


Real Time Protocol (RTP)

RTP protocol is the protocol used to transfer the “voice” or “video” stream from one a phone to the other and vice versa. The voice itself is represented using some codec. The term codec is a combination of “compression-decompression” and sometimes “coder-decoder”. This implies that voice could be compressed at the sender`s side and decompressed at the receiver`s side mainly to reduce the bandwidth required over the network for each call.
Figure 3: RTP packet header format

For more information; check this book out: RTP: Audio and Video for the Internet
  

Both SIP and RTP protocols, from the Internet Protocol Layering Model, are considered “Application Protocols”. They usually encapsulated inside UDP (often),  TCP or SCTP (sometimes) over IP. To understand the TCP/IP protocol suite in depth, refer to one of these 2 books:

No comments:

Post a Comment