Module 1 - VoIP Fundamentals

Published on May 2016 | Categories: Documents | Downloads: 29 | Comments: 0 | Views: 270

of 54

Content

ITU Centres of Excellence for Europe

NGN Services VoIP and IPTV
Module 1:

VoIP fundamentals

Table of contents 1.1. ITU NGN Standards for main real time services ............................................3 1.1.1. ITU IPTV Standards.............................................................................5 1.1.2. VoIP Standards....................................................................................6 1.2. NGN and Internet fundamentals ..................................................................11 1.3. VoIP infrastructure .......................................................................................16 1.4. Peer-to-peer VoIP ........................................................................................24 1.5. VoIP protocols and codecs ..........................................................................30 1.6. Signaling for VoIP ........................................................................................39 1.6.1. H.323 .................................................................................................39 1.6.2. SIP .....................................................................................................42 1.6.3. Media Gateway Control Protocol (MGCP) .........................................48 References .........................................................................................................52

1

1.1. ITU NGN Standards for main real time services

The next generation networks (NGN) provide new realities in the telecommunication industry characterised by many factors such as: the need to converge and optimise the operating networks and the extraordinary expansion of digital traffic (i.e., increasing demand for new multimedia services (including VoIP and IPTV), increasing demand for mobility, etc. While different services converge at the level of digital transmission, the separation of distinct network “layers” (transport, control, service and applications functions – see Figure 1.1) provides support for competition and innovation at each horizontal level in the NGN structure. At the same time NGNs also create strong commercial incentives for network operators to bundle, and therefore increase vertical and horizontal integration, leveraging their market power across these layers. This may bring about the need for closer regulatory and policy monitoring, in order to prevent the restriction of potential development of competition and innovation in a next generation environment, and therefore the risk of reducing benefits for consumers and the potential of new networks for economic growth and for providing multimedia services (including the main real-time services: VoIP and IPTV) with high level of QoS provisioning. Although there is a significant amount of work underway in standardisation forums on NGN, at the policy level, there is a still not complete agreement on a specific definition of “NGNs” and of VoIP and IPTV standards. The term is generally used to depict the shift to higher network speeds using broadband, the migration from the PSTN to an IP-network, and a greater integration of services on a single network, and often is representative of a vision and a market concept. According to ITU-T Recommendation Y.2001 (12/2004) General overview of NGN is described with the following definition: “A Next Generation Networks (NGN) is a packet-based network able to provide Telecommunication Services to users and able to make use of multiple broadband, QoS-enabled transport technologies and in which service-related functions are independent of the underlying transportrelated technologies. It enables unfettered access for users to networks and to competing service providers and services of their choice. It supports generalised mobility which will allow consistent and ubiquitous provision of services to users.”

2

Figure 1.1. Separation of the functional plans in two NGN stratums.

As it is known the NGN standardization work started in 2003 within ITU-T, and is worldwide today in various major telecom standardization bodies. The most active NGN relevant standardization bodies are ITU, ETSI, ATIS, CJK and TMF. The Next Generation Mobile Networks (NGMN) initiative is a major body for mobile-specific NGN activities, which are important contributors to the 3GPP specification for NGMN. For those who maybe don’t know the ITU (International Telecommunication Union) is an international organization within the United Nations in which governments and the private sector coordinate global telecom networks and services. ITU-T is the telecommunications sector of ITU. Its mission is to produce high-quality recommendations covering all the fields of telecommunications. In 2003, under the name JRG-NGN (Joint Rapporteur Group on NGN), the NGN pioneer work was initiated. The key study topics are: NGN requirements; the general reference model; functional requirements and architecture of the NGN; evolution to NGN. Moreover, the two fundamental recommendations on NGN are: Y.2001: ‘General overview of NGN’. Y.2011: ‘General principles and general reference model for nextgeneration networks’. These two documents comprise the basic concept and definition of NGN. In May 2004, the FG-NGN (Focus Group on Next Generation Networks) was established in order to continue and accelerate NGN activities initiated by the JRG-NGN. FGNGN addressed the urgent need for an initial suite of global standards for NGN. The NGN standardization work was launched and mandated to FG-NGN.

3

On 18 November 2005, the ITU-T published its NGN specification Release 1, which is the first global standard of NGN and marked a milestone in ITU’s work on NGN. The NGN specification Release 1, with 30 documents, specified the NGN Framework, including the key features, functional architecture, component view, network evolution, etc. Lacking protocol specifications, the ITU NGN Release 1 is not at an implementable stage; however, it is clear enough to guide the evolution of today’s telecom networks. With the release of NGN Release 1, the FG-NGN has fulfilled its mission and closed. As we mention before, ITU coordinates the global efforts (including governments, regional and national SDOs, industry forums, vendors, operators, etc.) in developing the ITU recommendations. Moreover, the ITU takes a threestage approach as follows to develop the NGN standards (a) Stage 1: identify service requirements; (b) Stage 2: describe network architecture and functions to map service requirements into network capabilities; and (c) Stage 3 : define protocol capabilities to support the services. All services (including VoIP and IPTV) and capabilities have to be specified to stage 3 to ensure that the standards are implementable. ITU’s NGN specifications are mainly contained in the Y-series and Q-series recommendations. The Y.2xxx series recommendations specify the overall characteristics of NGN whereas the Y.19xx series recommendations specify IP Television (IPTV) over NGN. The Q.3xxx series recommendations focus on the signalling requirements and protocols for NGN. Lists of these recommendations are given in the following: • ITU-T Rec. Y.1901 “Requirements for the support of IPTV services” • Y.1900-Y.1999 : IPTV over NGN • Y.2250-Y.2299 : Service aspects: Interoperability of services and networks in NGN • ITU-T Rec. Y.1910 “IPTV functional architecture” • ITU-T Rec. H.720 “Overview of IPTV terminal devices and end systems” • ITU-T Rec. H.721 “IPTV Terminal devices: Basic model” • ITU-T Rec. X.1191 “Functional requirements and architecture for IPTV security aspects”

4

1.1.1. ITU IPTV Standards
Definition of IPTV by ITU-T’s IPTV Focus Group is: “IPTV is defined as multimedia services such as television/video/audio/text/graphics/data delivered over IP based networks managed to provide the required level of QoS/QoE, security, interactivity and reliability.” The IPTV standardization work within ITU is on-going under the umbrella of a Global Standards Initiative, i.e. the IPTV-GSI. To date ITU has published two major standards for IPTV ITU-T Rec. Y.1901 “Requirements for the support of IPTV services” ITU-T Rec. Y.1910 “IPTV functional architecture” Y.1901 specifies the high level requirements to support IPTV services, including the following major areas: (a) general requirements on service offering, accounting and charging; (b) QoS and performance, e.g. quality of experience (QoE), traffic management; (c) security, including service and content protection, service security, network security, IPTV terminal security, subscriber security; (d) network related aspects, including multicast distribution, mobility; (e) end-system capabilities and interoperability aspects; and (f) middleware and content aspects. Moreover, Y.1910 describes the IPTV functional architecture to support IPTV services. The IPTV functional architecture is based on the use of existing network components as well as the NGN architecture, leading to three possible architectures: (a) IPTV functional architecture for non-NGN network components; (b) IPTV functional architecture based on NGN functional architecture, but not based on IMS; (c) IPTV functional architecture based on NGN and its the IMS component. On the other side, Y.1910 identifies the functional entities for each of the architectures mentioned above and the reference points (i.e. the interfaces) between these functional entities. It then describes the functional capabilities of these entities and reference points, including functional entities for interworking between different IPTV functional architectures, and with third-party applications. As envisaged by ITU, the next generation IPTV may see a change that requires interoperation between service providers and/or network providers. A potential outcome of this will be that a customer can go into a shop, buy an IPTV box, call their network operator and sign-up, and then access services from a range of third party service providers. In additional to Y.1901 and Y.1910, there are other IPTV standards published by ITU-T, including the following:

5

ITU-T Rec. H.720 “Overview of IPTV terminal devices and end systems”, which provides a high level description of the functionality of terminal devices for IPTV services; ITU-T Rec. H.721 “IPTV Terminal devices: Basic model”, which specifies the functionalities of IPTV terminal devices for IPTV basic services over a dedicated content delivery network, taking into account conditions on content delivery such as QoS; and ITU-T Rec. X.1191 “Functional requirements and architecture for IPTV security aspects”, which describes the functional requirements, architecture and mechanisms dealing with the security and protection of IPTV content, service, network, terminal devices and subscribers.

1.1.2. VoIP Standards
One of the most important emerging trends in telecommunications, which development represents a major change in the emerging information and communication technologies, undoubtedly is Voice over IP – the transmission of voice over packet-switched IP networks. VoIP in general have been advocated and studied since the mid 1970s. It was the advent of DSP technology for voice compression in the late 1980s and early 1990s that gave these services the impetus they needed to enter the mainstream. Commercial-grade technologies and services started to appear in the 1997 and books on the topic started to appear in 1998, with Mr. Minoli’s co-authored Delivering Voice over IP book (Wiley, April 1, 1998) being the first text on the market on this topic. A lot has transpired since then. Nowadays, enterprise NGN networks, cellular carriers, voice-over-cable carriers, “Quadruple play” emerging trends, “Pure-play VoIP carriers”, and even traditional voice carriers are all moving rather aggressively to a VoIP paradigm. VoIP has developed considerably in recent years and is gaining widespread public recognition and adoption through consumer solutions such as Skype and BT’s strategy of moving to an IP-based network. The great potential for very low-cost is driving the use of the IP technology, but in the long-term, VoIP is more significant is introducing free phone calls - that represents a major change in telecommunications. The fact that VoIP transmits voice as digitised packets over the Internet means that it has the potential to converge with other digital technologies, which in turn will result in new services and applications becoming available. However, the adoption of VoIP, especially over mobile and wireless networks is not without complications. The traditional PSTN telephone infrastructure has been built up over the last one hundred years or so and has developed into a robust voice communications system that provides reliability figures of nearly 100%. In contrast, VoIP is a relatively new technology with a fledgling architecture that is built on inherently less reliable data networks. This means that there are therefore justifiable concerns around the extent to which it is deployed. However, today technology offers opportunities for the development

6

of new applications and educational services, particularly through the potential for converging voice with other media and data. In the long-term, VoIP is likely to impact on some of the bigger picture developments within further and higher education such as virtual universities, identity management, and integration with enterprise-level services and applications. Furthermore, are presented the main factors that have been promoted by VoIP and its main barriers. So, the main factors that have been promoting VoIP include: Low cost/no cost software (softphone and configuration tools) for PC’s and PDA’s; Wide availability of analogue telephone adapters; Growing availability of broadband, wireless “hot spots” and other forms of broadband access; Packetised voice enables much more efficient use of the network (bandwidth is only used when something is actually being transmitted); The VoIP network can handle connections from many applications and many users at the same time (unlike the dedicated circuit-switch approach). Relative high cost of PSTN calls. On the other hand, the main barriers opposing VoIP include: High quality and reliability of the PSTN; VoIP quality of service can be variable; Lack of intrinsic QoS in many of IP networks around the world; Many challenges in wireless VoIP users; Some VoIP feature, service and VoIP service provider interconnection limitations; Relative difficulty in setup and use; End-2-end integrity of the signalling and bearer path problems; Introduction of call plans and flat rates charges by traditional PSTN operators. On the other side, as it is well known, there is no permanent physical connection which is established in packet-based networks, such as IP based networks. However, for VoIP, the communicating devices at the end-points build up a connection using corresponding protocols (such as H.323 or SIP: IETF RFC 3261, 2002; ITU–T Rec. H.323, 1998). Due to the fact that the International Telecommunication Union (ITU) and the Internet Engineering Task Force (IETF) are the two major international organisations recommending standards for VoIP, the ITU recommends H.323 and the IETF recommends the Session Initiation Protocol (SIP). While there is some overlap of functionality there are differences in approach and terminology. The prior buildup of the link basically settles the agreement between the two end-points that speech data will be exchanged between them. Only at this stage of the communication setup, the connection-oriented Transmission Control Protocol (TCP) is typically applied. After the connection is established, coded speech data are packetized into packets that are sent from source to destination.

7

At different levels of packetization, header information is added to the speech data payload, successively increasing the packet size. The subsequent packetization steps and the protocols involved are illustrated in Table 1.1, furthermore, the resulting packet structure and header information are shown in simplified form in Figure 1.2. We must to emphasize that the headers can be compressed to reduce the amount of data to be sent across the network (e.g. IETF RFC 3095, 2001). Also we must to note that while the TCP used for connection setup is connectionoriented, requiring acknowledgments between endpoints, the User Datagram Protocol (UDP) typically used for the transport of the speech data is connectionless, and hence yields fewer and smaller packets (see Table 1.2).
Table 1.1: Protocols and media access technologies involved in VoIP packetization.

* RTP: Real-time Transport Protocol (IETF RFC 3550, 2003); UDP: User Datagram Protocol (IETF RFC 768, 1980); IP: Internet Protocol; WLAN: Wireless Local Area Network (IEEE Std 802.11, 2005).

Figure 1.2. Illustration of the headers in VoIP.

8

Table 1.2: Header sizes of different protocols involved in VoIP.

Moreover, on sending, the speech packets ‘search’ their way through the network, where they are routed from one node to the next based on the destination address they carry. Consequently, subsequent packets may take different paths on their way to the destination. In case of congestion at some point of the network, they may arrive out of order or simply with considerable, and/or varying delay (delay and/or jitter, which are the main QoS parameters when we considered VoIP). An efficient speech communication cannot be carried out, if the transmission delay becomes too large (more then 150 ms). Hence, packets arriving too late for timely playback may be discarded by the receiver (packet loss). Similarly, if a router in the network is faced with too many packets during a traffic-burst period, it may have to drop packets. Furthermore, let we see what we need standards in VoIP? The answer is very simple; as with any communications technology, VoIP needs well-defined and industry supported methods for signalling call control information in order to succeed. Without such standards, the ability to communicate between users becomes at best severely restricted or difficult to achieve. Initial implementations of commercial VoIP solutions used proprietary techniques until industry developed a consensus around the use of ITU-T’s multimedia conferencing standards as a useful starting point. More specifically it was the development and promotion of the H.323 specification, which provided the initial focal point within the industry. Other relevant VoIP standards and recommendations include: H.225 defines the lowest layer that formats the transmitted video, audio, data, and control streams for output to the network, and retrieves the corresponding streams from the network; H.235 specifies the security requirements for H.323 communications. Four security services are provided: authentication, integrity, privacy, and non-repudiation; H.245 specifies messages for opening and closing channels for media streams and other commands, requests and indications; H.248, also known as Megaco (MEdia GAteway COntrol), is a current draft standard and a co-operative proposal from IETF and ITU. Also described in RFC 3015. It addresses the same requirements and has many similarities to MGCP; H.261. If video capabilities are provided, it must adhere to the H.261 protocol with QCIF as its mode; H.263 specifies the CODEC for video over the PSTN;

9

Various audio CODECs are specified under G.711, G.722 G.723,G.723.1, G.726, G.729 and G.729.a; T120 a protocol for data and conference control. Over 120 leading computer, telecommunication and technology organisations have indicated their intent to support and implement H.323 in their products and services. This wide ranging support establishes H.323 as the de facto standard for audio and video conferencing over the Internet. Moreover, the section 1.5 is giving more information and details about VoIP protocols, codes, signaling (H.323, SIP, MGCP) and etc.

10

1.2. NGN and Internet fundamentals

The NGN is an evolutionary process and it can be expected that operators will take different migratory paths, switching to NGN while gradually phasing out existing circuit networks, or building a fully-IP enabled network. The investment in developing NGN is motivated by several factors (Table 1.3). Telecommunication operators across the whole world have been faced with a decline in the number of fixed-line telephone subscribers, coupled with a decrease in average revenue per user (ARPU), as a result of competition from mobile and broadband services. Traditional sources of revenue (voice communications) have declined rapidly and fixed-lines operators are subject to an increase in competitive pressure in the market to lower tariffs and offer innovative services. This has generated pressure from the investors’ community to decrease the cost and complexity of managing multiple legacy networks, by disinvesting from non-core assets and reducing operational and capital expenses.
Table 1.3. NGN drivers

In this context, the migration from separate network infrastructures to next generation core networks is a logical evolution, allowing operators to open up the development of new offers of innovative content and interactive, integrated services, with the objective to retain the user base, attract new users, and increase ARPU (see Box 3 in a Table 1.3). NGN is therefore often considered essential for network operators to be “more than bit pipes” and to strategically

11

position themselves to compete in the increasingly converged world of services and content, where voice is no longer the main source of revenue, and may become a simple commodity. The investment in next generation access networks – both wired and wireless – will be necessary in order to support the new services enabled by the IP-based environment, and to provide increased quality. At the same time, the important investment necessary to develop next generation infrastructures brings about new economic and regulatory issues, which will be analysed in the following sections. Although the shift in the migration to all-IP networks is taking place at different paces in different countries, several operators in the several country across the world have already updated their transport networks, and are now dealing with NGN at the local access level. Solutions embraced by fixed operators may also increasingly support IP Multimedia Subsystem (IMS), to enable fixed-mobile convergence. For the moment the most common services provided through the new networks are the provision of PSTN/ISDN emulation services, i.e. the provision of PSTN/ISDN service capabilities and interfaces using adaptation to an IP infrastructure, and video on demand (VoDs). At the same time the business world is showing an increasing interest in new NGN-enabled services and applications. Companies are migrating their Time Division Multiplexing switches to IP in order to enable integrated applications for specific industry-based functionalities and purposes. Progress in the field of mobile (cellular) communications is taking shape with the development of the IMS standard. For the moment two services have been standardised under the IMS protocol, Push to Talk over Cellular (PoC) and Video Sharing. Prominent telecommunication network equipment suppliers are actively supporting the take up of IMS and some of them are implementing IMS strategies and commercial IMS products. IMS is seen as the enabler for the migration to next generation networks of mobile operators and therefore for the implementation of fixed-mobile convergence. No evident killer application has currently emerged, with many operators focusing on one specific service: voice. Facilitating the use of voice applications, enabling users to handle their calls easily between fixed and mobile networks, and to receive calls wherever they are, is fundamental for the take–up of the service. Operating in an IMS environment would allow a seamless handover from WLAN (fixed) to mobile during calls (Voice Call Continuity). In order for real-time voice calls to be offered seamlessly between the circuit switched domain and the Wireless LAN interworking with IMS architecture, the Third Generation Partnership Project (3GPP) is currently working to develop the appropriate Technical Specifications to define this functionality as a standard 3GPP feature. The study by 3GPP of the standard is underway. In the meanwhile, fixed-mobile converged services have been launched by some mobile operators with access to fixed networks, using a different standard – Unlicensed Mobile Access (UMA) – allowing users to seamlessly switch from fixed to mobile networks (see below, paragraph on Fixed Mobile Convergence).

12

In addition, increasing competitive pressure on mobile carriers is coming from the IP world. Thanks to the availability of dual-use devices and Wi-Fi hotspots, service providers – such as Skype, Google, and others – are able to offer on the market a host of new services for mobile users in a very short period of time. This rapidity constitutes an important comparative advantage, which in some cases provoke the reaction of mobile operators (and manufacturers), tending to limit the services and applications users can access from their mobile handset. Moreover, the technological developments associated with next generation networks should help combine the characteristics of the traditional telecommunication model, and of the new Internet model, dissolving the current divisions and moving towards a harmonised and coherent approach across different platforms, gradually bringing to full convergence fixed and mobile networks, voice, data services, and broadcasting sectors. In short, in the future the choice of the technology used for the infrastructure or for access will no longer have an impact on the kinds and variety of services that are delivered. This however does not reflect the current situation, where the two worlds still have different visions and commercial models (Figure 1.3).

Figure 1.3 NGN Convergent model

The telecommunications tradition emphasises the benefits of higher capacity local fibre access facilities, and powerful network intelligence. Access in this context should be simple and reliable, with centralised network management and control to guarantee the seamless provision of a wide range of services, bundled network-content-applications offers, and one-stop shop solutions. On the other hand, the Internet world traditionally focuses on edge innovation and control over network use, user empowerment, freedom to choose 13

and create applications and content, open and unfettered access to networks, content, services and applications. Freedom at the edges is considered more important than superior speed of managed next generation access networks. Indeed, the “Internet” still represents different things to different people, and next generation networks are seen as both a possibility for improved services or as a way to constrain the Internet into telecommunication boundaries, adding new control layers, capable of discriminating between different content, and “monetise” every single service accessed. Services provided over next generation networks allegedly will differ from services currently provided over the public Internet which is based on a “best effort” approach, where the quality of transmission may vary depending on traffic loading and congestion in the network, while with NGN packet delivery is enhanced with Multi Protocol Label Switching (MPLS). This allows operators to ensure a certain degree of Quality of Service – similar to the more constant quality of circuit switched networks – through traffic prioritisation, resource reservation, and other network-based control techniques, as well as to optimise network billing as in circuit-switched transport. The concept of network-based control seems to be the main difference between the public Internet approach and next generation managed IP networks approach. NGN offers the possibility to provide a detailed service control and security from within the network, so that networks are aware of both the services that they are carrying and the users for whom they are carrying them, and are able to respond in different ways to this information. In contrast, the Internet aims to provide basic transmission, remaining unaware of the packets/services supported. While the Internet model remains therefore completely open to users and new applications and services, in managed IP networks operators are able to control the content going through the network. In turn, this may have negative implications for the content of third party providers if their traffic is discriminated against in relation to that of an integrated operator. However, one is a clear, and that is the fact that wining combination of NGN protocol stack will be IP/Ethernet/Optical, due to the fact that it gives the most intelligent infrastructure solution for NGN. In Figure 1.4 is shown that intelligent IP/Ethernet NGN structure. Moreover, in the end of this section, one NGN transport and service configuration example is shown in Figure 1.5, together with the main functions that are supported by NGN release 1 specifications. As it is well known in Release 1 all services are carried over IP although IP itself may in turn be carried over a number of underlying technologies, such as ATM, Ethernet, etc. Release 1 assumes IPv4 or IPv6 networking at packet interconnection points and packet network interfaces and therefore focuses on the definition of IP packet interfaces.

14

Figure 1.4 Intelligent Infrastructure for IP NGN

Figure 1.5 Transport and service configuration of NGN

15

1.3. VoIP infrastructure

This section describes the main VoIP Infrastructures which are worldwide used, with high reliability, functionality and QoS provisioning. First of all, let we describe the main elements of the VoIP infrastructure (illustrated in Figure 1.6).

Figure 1.6. Basic VoIP Infrastructure

As we can see, the main VoIP entities are the following: VoIP Servers: These are the components responsible for processing the VoIP signalling messages, routing the signalling messages to the correct destination and possibly executing additional services such as user authentication, PSTN like services such as call forwarding and etc. As we can see here the VoIP servers are using the session initiation protocol (SIP) (as the most worldwide signalling protocol), but it not excusing the usage of H.323 too. PSTN gateways: Often also called media gateways, these are the components connecting the Internet to the PSTN and hence enabling calls from the Internet to the PSTN and vice versa. These components have the following functionalities: - Termination of PSTN signalling - Transcoding functionality (voice encoding from G.711 to other media encoding-G.727, G.729 etc) - Splitting voice samples into RTP Packets Address resolution servers: VoIP addresses are usually described as URIs in the form of “sip:user@domain” or as an E.164 number “+49303030”. As with any Internet service, there is a need to translate between the high level service specific names and IP addresses. For this there are two major components in the VoIP architecture: - DNS servers: Domain name servers constitute a distributed database enabling the mapping between domain names and IP 16

addresses. - ENUM severs: ENUM is a service enabling the mapping between an E.164 number and a SIP URI (for more details, see the next Module). Authentication, Authorization and Accounting servers: AAA severs contain the necessary information to authenticate a user (e.g., password) as well as the user profile. The user profile indicates in general the user specific services such as white and black lists or call forwarding specification for example. AAA servers are usually based on protocols such as RADIUS or DIAMETER. NAT-Traversal support: As SIP carries addressing information of the communicating end parties inside the signalling messages. Thereby, when using private addresses some additional mechanisms are needed to assist the end systems in traversing the network address translators, as clients which advertise private address cannot be contacted from the public Interent Some of these mechanisms to map private addresses to public ones are located directly at the NATs, some require additional servers to be provided by the VoIP provider (See Sec. 3.9 for details): - STUN: If a NAT itself cannot provide the mapping from the private client’s address, to a public visible one, the client can contact a STUN server. This server provides mechanisms to detect the public address of the client. Hence, the client will be enabled to generate messages with its public address advertised. - RTP-Proxy: Some special kinds of NATs don’t allow incoming connections from a client unless this connection was initiated by the same client. If two clients behind such NATs want to establish a RTP connection, one client needs to contact a public host (the RTP proxy) that allows incoming connections and as such proxies the traffic between both user clients. Application servers: Application servers are components that enhance the VoIP service with additional services such as conferencing, voicemail or integration with other applications such as calendar or media players. The most important SIP operation is that of inviting new participants to a call. Moreover, SIP is worldwide used protocol for signalling in VoIP (besides others: H.323, MGCP and etc.), due to that here we are focused on VoIP Infrastructure solutions for SIP. In the chapter 1.5 and 1.6 more detail description and elaboration for another VoIP signalling protocols and architecture is given. In order to describe SIP NGN VoIP infrastructure we need to explain the main SIP functionalities and SIP entities: Registrar: User agents contact registrar servers to announce their presence in the network. The registrar server is mainly thought to be a database containing locations as well as user preferences as indicated by the user agents.

17

Proxy: A proxy server receives a request which it forwards towards the current location of the callee – either directly to the callee or to another server that might be better informed about the actual location of the callee. Redirect: A redirect server receives a request and informs the caller about the next hop server. The caller then contacts the next hop server directly. User Agent: A logical entity in the terminal equipment that is responsible for generating and terminating SIP requests. In SIP, a user is identified through a SIP URI in the form of sip:user@domain. This address can be resolved to a SIP proxy that is responsible for the user’s domain. To identify the actual location of the user in terms of an IP address, the user needs to register his IP address at the SIP registrar responsible for his domain (see Figure 1.7).

Figure 1.7. SIP register flow

Figure 1.8. SIP INVITE flow via Proxy

Thereby when inviting a user, the caller sends his invitation to the SIP proxy responsible for the user’s domain, which checks in the registrar’s database the location of the user and forwards the invitation to the callee (see Figure 1.8). The callee can either accept or reject the invitation. The session initiation is then finalized by having the caller acknowledging the reception of the callee’s answer. 18

During this message exchange, the caller and callee exchange the addresses at which they would like to receive the media and what kind of media they can accept. After finishing the session establishment, the end systems can exchange data directly without the involvement of any SIP proxy. Furthermore, one example of VoIP Infrastructure Solution (Cisco VoIP Infrastructure Solution for SIP) is been illustrated. There are two possible approaches of the Cisco VoIP Infrastructure Solution for SIP: from an intranetwork approach and an internetwork approach. As a first step toward a total SIP-based VoIP solution in the intranetwork approach, VoIP gateways configured to support SIP are implemented to replace the traditional DAL and bypass carrier toll lines. In Figure 1.9, Cisco SIP gateways and an IP network have been introduced between the private branch exchanges (PBXs).

Figure 1.9. Toll Bypass and DAL Replacement

As the next step, SIP proxy servers are used to provide support for a scalable private number plan. In Figure 1.10, SIP proxy servers have been added to the previous IP network.

Figure 1.10. Scalable Private Number Plan Support

Moreover, in the next step, Cisco SIP IP phones are added. These phones connect directly to the IP network and, when used with the other SIP components, provide features such as call hold, call waiting, call transfer, and call forwarding. In Figure 1.11, Cisco SIP IP phones have been connected directly to the IP network. As the next step, application services (such as a RADIUS server) are integrated with the SIP proxy servers. This enables the SIP proxy servers to perform authentication (via HTTP digest). It also provides end customers with enhanced services, such as "find me" and call screening. The Cisco SIP

19

gateways interface with the application services using AAA and RADIUS for billing purposes.

Figure 1.11. Cisco SIP IP phone Support

In Figure 1.12, application servers have been added to the IP network to interface with the SIP proxy servers.

Figure 1.12. Application Services Support

In the final step, a unified-messaging server is added to provide voice mail. In Figure 1.13, a unified-messaging server has been added to the IP network.

Figure 1.13. IP Telephony Services with Unified Messaging

20

Here the VoIP infrastructure (Solution for SIP) intranetwork phase is summarised: At the center is a QoS-enabled IP network using Cisco internetworking equipment with a set of Cisco SIP gateways and one or more SIP proxy servers. The Cisco SIP gateways are connected to the PBXs via T1 or E1 lines with channel-associated signaling (CAS) or primary-rateinterface (PRI) signaling. Several traditional telephones or fax machines are connected to the PBXs. Cisco SIP IP phones are connected directly to the IP network. A server running a unified-messaging application is also connected to the IP network. SIP is used for signaling (or session initiation) between the SIP clients, Cisco SIP IP phones, Cisco SIP gateways, and SIP proxy servers. RTP/RTCP is used to transmit voice data between the SIP endpoints after sessions are established. As this example shows, the Cisco VoIP Infrastructure Solution for SIP is designed not only to provide an alternative to traditional telephony equipment, but also to interact with existing equipment. Moreover, in the following a possible internetwork phase implementation of the Cisco VoIP Infrastructure Solution for SIP for integrating a SIP-enabled VoIP network with a public-switched-telephone-network (PSTN) infrastructure is illustrated. This phased approach builds on an existing SIP VoIP network as outlined in the the above Intranetwork Phased Approach Implementation. As the first step to an internetwork phased approach, Cisco Secure PIX Firewalls are added to the existing intranetwork for inside network security. In Figure 1.14, Cisco Secure PIX Firewalls have been added to the IP network.

Figure 1.14. The Cisco Secure PIX Firewall in a SIP Network

21

The final internetwork phase is to implement the Cisco SS7 Interconnect for Voice Gateways Solution for integrating the SIP-enabled VoIP network with a PSTN infrastructure. In Figure 1.15, Cisco SS7 Interconnect for Voice Gateways Solution components have been added.

Figure 1.15. Cisco SS7 Interconnect for Voice Gateways Solution Implemented with a SIP VoIP Network

Moreover, if we see from NGMN (Next Generation Mobile Network) ITU-T aspect, Mobile VoIP service is assumed to be a seamless service, i.e. a VoIP service that is implemented such that it will ensure that mobile users will not experience any service disruptions while changing the point of attachment. Mobile VoIP service requires the support of service continuity for terminal mobility taking into account network conditions (e.g. the number of user sessions, mobility events and bandwidth consumption) and users’ requirements. In that point, Figure 1.16 illustrates a general network architecture involving two operators supporting different types of access networks, i.e. cellular access networks (such as 3G), WiFi access networks and mobile WiMAX access networks and where users of the mobile VoIP service may move between different access networks in the same operator domain or between different operator domains. As shown in Figure 1.16, NGN architectural components, i.e. service control functions (SCF), mobility management and control functions (MMCF), network attachment control functions (NACF) and resource and admission control functions (RACF), are assumed to be used for supporting the QoS enabled mobile VoIP service. For more details about QoS provisioning architecture in VoIP see in section 2.2 (in the following Module 2). 22

Figure 1.16 - General network architecture for QoS enabled mobile VoIP service

23

1.4. Peer-to-peer VoIP

The term peer-to-peer (P2P) refers to the concept that in a network of equals (peers, see Figure 1.17) using appropriate information and communication systems, two or more individuals are able to spontaneously collaborate without necessarily needing central coordination. In contrast to client/server networks, P2P networks promise improved scalability, lower cost of ownership, self-organized and decentralized coordination of previously underused or limited resources, greater fault tolerance, and better support for building ad hoc networks. In addition, P2P networks provide opportunities for new user scenarios that could scarcely be implemented using customary approaches.

Figure 1.17 llustration of a Peer-to-Peer architecture

Moreover, in the following are give the core characteristic of P2P networks: Sharing of distributed resources and services: In a P2P network each node can provide both client and server functionality, that is, it can act as both a provider and consumer of services or resources, such as information, files, bandwidth, storage and processor cycles. Occasionally, these network nodes are referred to as servents—derived from the terms client and server. Decentralization: There is no central coordinating authority for the organization of the network (setup aspect) or the use of resources and communication between the peers in the network (sequence aspect).

24

This applies in particular to the fact that no node has central control over the other. In this respect, communication between peers takes place directly. Frequently, a distinction is made between pure and hybrid P2P networks. Due to the fact that all components share equal rights and equivalent functions, pure P2P networks represent the reference type of P2P design. Within these structures there is no entity that has a global view of the network. In hybrid P2P networks, selected functions, such as indexing or authentication, are allocated to a subset of nodes that as a result, assume the role of a coordinating entity. This type of network architecture combines P2P and client/server principles. Autonomy: Each node in a P2P network can autonomously determine when and to what extent it makes its resources available to other entities. Moreover, P2P technology first came into focus through companies like Napster, Kazza and Torrents, who developed file sharing applications that would allow users to share their own files, as well as search for and download files of other users on the network. Instead of relying on a centralized client server relationship, a peer to peer network gets it’s strength from each individual node, adding bandwidth and processing power with each new member for the good of the many. Moreover, peer to peer services can scale indefinitely without the use of expensive central servers (from a cost standpoint that is really rewarding). Peer to peer Internet VoIP is, like Napster, a software application that you download on to your computer from a peer to peer VoIP service provider. The softeware or “soft phone” as it is called, is free to download and calls to or from anyone on the network are free. The only hardware you need is a headset, or a microphone and speakers. Internet telephony headsets are cheap and come in USB or can plug directly into your sound cards. For those with web cams, many providers allow you to make video calls to others on the network for free. Services offered with this technology go above and beyond the Telco's, with conference calls, call forwarding, instant messaging and chat - peer to peer Internet telephony literaly turns your computer into a telephone/vidiophone communications center. Like the traditional VoIP providers, calls within the network are free worldwide, but calls to a PSTN number will usualy cost you if it is an option. You can in some cases, have different numbers in other locations so that people can call you from a land line even from other counties toll free. Even if you do have to pay to get on the PSTN, the rates are so much cheaper than a Telecommunication companies. Just like the other forms of VoIP, developers have had some technological hurdles to overcome. Quality of Service, NATed firewalls, and centralized directories of members using a dynamic IP address are just a few. Also, as calls and instant messages are routed through the public Internet, encryption is a must for any user. However, peer-to-peer calls become important if callers use features like push-to-talk, video, and mesh-based audio conferencing. The VoIP versions of these features cannot be transmitted over PSTN. A peer-to-peer VoIP

25

call occurs when two VoIP phones communicate directly over IP without IP PBXs between them. A peer-to-peer call can be initiated directly, by calling a phone’s SIP URI, or indirectly by dialing a phone number. Probably the best known Peer-peer VoIP applications are skype, yahoo IM, MSN and etc. Here as an example for better explaining of Peer-to-peer VoIP concept we are focusing on Skype application. Skype is developed by the organization that created Kazaa. Moreover, Skype allows its users to place voice calls and send text messages to other users of Skype clients. In essence, it is very similar to the MSN and Yahoo IM applications, as it has capabilities for voice-calls, instant messaging, audio conferencing, and buddy lists. However, the underlying protocols and techniques it employs are quite different. Like its file sharing predecessor Kazaa, Skype uses an overlay peer-topeer network. There are two types of nodes in this overlay network, ordinary hosts and super nodes (SN). An ordinary host is a Skype application that can be used to place voice calls and send text messages. A super node is an ordinary host’s end-point on the Skype network. Any node with a public IP address having sufficient CPU, memory, and network bandwidth is a candidate to become a super node. An ordinary host must connect to a super node and must authenticate itself with the Skype login server. Although not a Skype node itself, the Skype login server is an important entity in the Skype network as user names and passwords are stored at the login server. This server ensures that Skype login names are unique across the Skype name space. Starting with Skype version 1.2, the buddy list is also stored on the login server. Figure 1.18 illustrates the relationship between ordinary hosts, super nodes (in Figure 1.19 the worldmap of super nodes to which Skype establishes a TCP connection at login are given) and the login server. Apart from the login server, there are SkypeOut and SkypeIn servers which provide PC-to-PSTN and PSTN-to-PC bridging. SkypeOut and SkypeIn servers do not play a role in PC-to-PC call establishment and hence we do not consider them to be a part of the Skype peer-to-peer network. Thus, we consider the login server to be the only central component in the Skype peer-to-peer network. Online and offline user information is stored and propagated in a decentralized fashion. Moreover, illustration of Skype login process is given in Figure 1.20. In this Skype login process, SC sends UDP packets of length 18 bytes to all bootstrap SNs. After 5s, it attempts TCP connections with the seven bootstrap SN IP address and ports 33033. Authentication with the login server is not shown in Figure 1.20.

26

Figure 1.18 - Skype P2P Network

Figure 1.19 - Worldmap of super nodes to which Skype establishes a TCP connection at login

The Skype network is an overlay network and thus each Skype client (SC) needs to build and refresh a table of reachable nodes. In Skype, this table is called host cache (HC) and it contains IP address and port number of super nodes. Moreover, the Skype client listens on particular ports for incoming calls, maintains a table of other Skype nodes called a host cache, uses wideband codecs, maintains a buddy list, encrypts messages end-to-end, and determines if it is behind a NAT or a firewall.

27

Figure 1. 20 – Skype login process

Starting with Skype v1.0, the HC is stored in an XML file. Skype also have implemented a ‘3G P2P’ or ‘Global Index’ technology, which is guaranteed to find a user if that user has logged in the Skype network in the last 72 hours. Skype uses wideband codecs which allows it to maintain reasonable call quality at an available bandwidth of 32 kb/s. It uses TCP for signaling, and both UDP and TCP for transporting media traffic. In the following the key features of Skype are summarized: Online and offline user information is stored and propagated in a decentralized fashion and so are the user search queries. Skype has the ability to traversal firewalls and NATs by using a variant of STUN & TURN protocol to determine the type of NAT and firewall it is behind. Skype uses wideband codecs (iLBC and iSAC) to maintain reasonable call quality at an available bandwidth of 32 kb/s. Skype codecs allow frequencies between 50-8,000 Hz to pass through. It uses TCP for signaling, and both UDP and TCP for transporting media traffic. Signaling and media traffic are not sent on the same ports. Skype uses 256-bit encryption known as AES (Advanced Encryption Standard), which has a total of 1.1 x 1077 possible keys, in order to actively encrypt the data in each Skype call or instant message. It also uses 1536 to 2048 bit RSA to negotiate symmetric AES keys. User public keys are certified by Skype server at login.

28

Skype has also implemented a ‘3G P2P’ or ‘Global Index’ technology which has the ability to find a user if that user has logged in the Skype network in the last 72 hours. Silence suppression is not supported in order to maintain UDP bindings and TCP congestion window size Skype functions can be classified into startup, login, user search, call establishment and tear down, media transfer, and presence messages. All those functions are explained below: Startup: When Skype run for the first time, it contact Skype server with HTTP message to get latest version. Login: During this process, skype client authenticates its user name and password with the login server, advertises its presence to other peers and its buddies, determines the type of NAT and firewall it is behind, and discovers online Skype nodes with public IP addresses. Search: Skype uses its Global Index (GI) technology to search for a user. If both users were on public IP addresses, online and were in the buddy list of each other, then upon pressing the call button, the caller skype client established a TCP connection with the callee client. Signaling information was exchanged over TCP. During call tear-down, signaling information is exchanged over TCP between caller and callee if they are both on public IP addresses, or between caller, callee and their respective SNs. Compared to Yahoo, MSN, and Google Talk applications, Skype reported the best mouth-to-ear latency. Moreover, Skype is a selfish application and it tries to obtain the best available network and CPU resources for its execution. It changes its application priority to high priority in Windows during the time call is established. It evades blocking by routing its login messages over SNs. This also implies that Skype is relying on SNs, who can misbehave, to route login messages to the login server. Skype does not allow a user to prevent its machine from becoming a SN although it is possible to prevent Skype from becoming a SN by putting a bandwidth limiter on the Skype application when no call is in progress. Theoretically speaking, if all Skype users decided to put bandwidth limiter on their application, the Skype network can possibly collapse since the SNs hosted by Skype may not have enough bandwidth to relay all calls.

29

1.5. VoIP protocols and codecs

Besides SIP and H.323 VoIP protocols (mentioned in section 1.1), there are a number of other protocols that may be used in VoIP applications. Although these protocols will generally interoperate with H.323 standards, some may not. In that context, some of the main other VoIP protocols include: Media Gateway Control Protocol (MGCP): A development of SGCP and IPDC protocols. It is a signaling and control protocol for controlling Voice over IP (VoIP) Gateways from external media gateway controllers or call agents. A VoIP Gateway is a part of a network that provides conversion between the audio signals carried on telephone circuits and data packets carried over the Internet or over other packet networks. Media Gateway Control (MEGACO) and H.248 are an enhanced version of MGCP. MGCP responds to the requirements in RFC3435 Media Gateway Control Protocol Version 1; IP Device Control (IPDC). A group of protocols for controlling hardware devices such as control gateway devices at the boundary between the circuit- switched telephone network and the Internet. Examples of such devices include network access servers and VoIP gateways; Real Time Transport Protocol (RTP). Described in IETF RFC 1889, this is a realtime, end-to-end protocol, utilising existing transport layers for data that has realtime properties; RTP Control Protocol (RTCP). Described in IETF RFC 1889, a protocol to monitor QoS and carry information on the participants in a session. It also provides feedback on total performance and quality so allow modification to be made. Resource Reservation Protocol (RSVP). Described in IETF RFC 22502209. This is a general purpose signalling protocol allowing network resources to be reserved for a connections data stream, based on receiver-controlled requests. There may be scability issues in using this protocol due to its focus and management of individual application traffic flows26; Simple Gateway Control Protocol (SGCP). SGCP is a simple "remote control" protocol that the call agent uses to program gateways according to instructions received through signalling protocols such as H.323 or SIP27. Now superseded by MGCP, an IETF work in progress; Session Announcement Protocol (SAP). Protocol used by multicast session managers to distribute a multicast session description to a large group; Real Time Streaming Protocol (RTSP). Interface management to a server providing real-time data;

30

Session Description Protocol (SDP). Describes the session for other protocols including SAP, SIP and RTSP. In common with many communication and data systems, the protocols used in VoIP generally follow a layered hierarchy, similar to the Open Systems Interconnect theoretical model developed by the International Organisation for Standards (OSI). There are, however, exceptions to this, for example IP over ATM. The following Table 1.4 provides an overview of the principal VoIP protocols (as described in a Cisco).
Table 1.4. Illustration of the main VoIP protocols

VoIP codecs A VoIP codec ("coder - decoder") is an algorithm that squeezes (the "coder") digitized audio so it fits more easily into a VoIP data channel (IP packets), and then re-expands it (the "decoder") so the user can hear the audio once again. VoIP codecs operate by taking uncompressed digital audio, and applying an agreed, standardized algorithm to reduce the number of bits it takes to represent that audio. It's important for both ends of a phone call to agree on what that algorithm should be, of course, and in a VoIP phone call (using H.323, SIP and etc. signaling), this agreement is achieved when the call is first placed, through a process called "capabilities exchange". In other words, the codecs (coder/decoders) provide the means to convert analogue voice signals to digital signals and reverse the process on delivery. Codecs are also known as Vocoders or voice coder/decoders. On conversion from analogue to digital, a data stream is packetised and transported across the network. The receiving endpoint will not only have to reassemble the packets into the correct sequence,

31

but also decode the contents. Clearly commonality of standards and codecs is essential if the communication is to be intelligible. Any detected signaling tones are routed around the codec which can modify the tones to the point it is not recognized by the device being signaled. Moreover, every VoIP phone contains one or more codecs, and during call establishment, they share their lists of supported codecs. One phone, for example, may say "Hey stranger, I can support codecs A, B, or C", and the other one will respond "Nice to meet you, I can support codecs "B, C, or D." At this point, both phones recognize that they could converse in either B or C (this process is can easily be compared to two multilingual strangers meeting on the street, figuring out what languages they share, then deciding which of the shared languages to proceed in). Depending on how they have been set to prioritize various parameters, one phone may then say "well, since C gives better audio bandwidth than B, let's proceed with that," or "B uses a lower bit rate and my company thinks that's more important, so let's proceed with that." Most VoIP phones contain a number of different codecs covering a range of performance levels and, often, bandwidths. Having wideband capability doesn't mean that a phone is unable to connect to a narrowband phone, it just means that it has a wider repertoire and can do both, like a musician who can play both clarinet and saxophone (perhaps not at the same time). So now let's consider the following question: On what basis does one evaluate and choose those wideband codecs? Here's a first glance at the most important codec characteristics: Audio bandwidth (higher is better) Data rate or bit rate (how many bits per second, fewer is better) Audio quality loss (how much does it degrade the audio, lower is better) Kind of audio (does it only work with speech, or with anything?) Processing power required (less is better) Processor memory required (less is better) Openly available to vendors? (“yes” is essential) Inserted delay (audio latency caused by the algorithm, less is better) Resilience (how insensitive to lost or corrupted packets, more is better) ITU standards-based (standardized by Telecommunications Union - “yes” is better) the International

It is quickly evident from the large number of parameters, however, that no codec is likely to be "best" in all categories at any given time. As you read through this, it’s possible you already have experience in evaluating these parameters among narrowband codecs for existing VoIP systems (if you've ever compared G.711 against G.729, for example). Except for boosting the audio bandwidth to wideband, the other tradeoffs are much the same. Let's look at

32

some of the key parameters, and compare them among the most popular wideband VoIP codecs. Principal Wideband VoIP Codecs Today are: L256. The simplest of all wideband codecs, the 7 kHz L256 directly sends all the bits of digital audio sampled into 16-bit words at 16 kilosamples per second (ksps), using no compression whatever, hence the name ("Linear 256” ksps). L256 is a basic requirement in all VoIP phones, but is seldom used because of its high bit rate. G.719. Perhaps the best match among requirements for communication systems at 20 kHz, G.719 is a recent ITU-approved arrival that combines excellent quality for music and voice with low latency, modest processor load, and network-friendly bit rates. G.722. This is the grandfather of 7 kHz wideband VoIP codecs, and the most widely deployed so far. G.722 applies adaptive differential pulse code modulation (ADPCM) to high and low frequencies separately, yielding an algorithm that works equally well with music or voice. G.722.1. Also known as "Siren 7," this modern 7 kHz audio codec is in almost every videoconferencing system today and is gaining traction in VoIP because of its higher efficiency and lower bit rate. G.722.1 is a "transform" (as in "Fourier transform") codec and works by removing frequency redundancies in any kind of audio. G.722.2. This codec, "AMR-WB," is a 7 kHz wideband extension of the popular adaptive multi-rate (AMR) cellphone algorithm, and excels in delivering wideband high-quality voice at the lowest bit rates. G.722.2's algebraic code excited linear prediction (ACELP) algorithm is optimized for speech, and works by sending constant descriptions of how to shape and stimulate a human speech tract to reproduce the sound you feed into it. G.722.1 Annex C. Also known as "Siren14," this is a 14 kHz extension of G.722.1 and is popular because of its wider bandwidth, its efficiency, and its availability (under license) for zero royalty. Speex. Speex is an open-source CELP codec. MPEG. There are more than 25 versions of the moving pictures expert group (MPEG) transform codecs, each delivering a set of performance levels optimized for various parameters. The variant best suited to telecommunications is MPEG4 AAC-LD, a lower-delay version of the intended MP3 successor, MPEG4 AAC. MP3. The popular MP3 format uses a form of transform coding, and is optimized for media distribution. FLAC. The Free Lossless Audio Codec (FLAC) produces much higher bit rates than most other codecs, but compensates by preserving complete audio quality.

33

Each conferencing environment has its own acoustical challenges that require an appropriately designed conferencing solution. Let's examine some of these differences. Audio Bandwidth Audio bandwidth corresponds to audio fidelity, that is, the ability to carry sounds ranging from very low pitches, like a kettledrum or a sonic boom, to very high pitches, like a cymbal or a plucked guitar. Therefore, more bandwidth is better. The human voice has important content beyond 14 kHz (this is why wideband telephony, even at 7 kHz, delivers such a telling improvement over older 3 kHz analog phones). The human ear can be sensitive to 20 kHz, and virtual every medium we experience today carries sound over this full range. VoIP is also working its way toward supporting up to 20 kHz, but today, there are more codecs available to support 7 kHz audio than these even higher bandwidths. This is because 7 kHz in and of itself provides an easily achievable and dramatic improvement in voice-only communications. Desktop phones supporting 7 kHz are available from many vendors today, but to-date, only Polycom has introduced conference phones at 14 kHz and 20 kHz. Data Rate The data rate required by a compressed audio channel becomes important when network bandwidth is limited, especially when supporting multiple phone connections. This is a common issue in narrowband VoIP telephony (comparing the bit rates of G.711 vs. G.729 is a common discussion), and its importance in wideband-capable systems is no different. Table 1.5 shows some typical numbers.
Table 1.5 Audio bandwidth versus bit rate for some popular VoIP codecs. BW (kHz) 3.3 7 14 20 22 Typical bit rate (kbps) 8 (G.729), 56 (G.711) 10 (G.722.2), 24 (G.722.1), 64 (G.722) 32 (G.722.1C) 32 (G.719), 64 (AAC-LD) 32 (Siren22)

From the Table 1.5 we can easy conclude that the typical bit rates don't necessarily raise with rising audio bandwidth; the bit rate has as much to do with the codec chosen as with the bandwidth. The reasons for this are twofold: audio

34

contains most of its information in the lower frequencies, so there's less information to be coded and sent in the higher frequencies, and the human ear is less sensitive to inaccuracies at the higher frequencies, so a compression algorithm can be a little less precise without being noticed. Another point to note is the span of bit rates among wideband codecs. For example, 7 kHz audio requires 64 kbps from G.722, but only 10 kbps from G.722.2. Here, the difference is due to the assumptions made by the codecs. G.722.2, an ACELP codec, assumes that it's working on human speech. It knows that it's not going to be fed the sounds of a violin or a speeding locomotive, so it takes a whole different approach to compression and consequently can be extremely efficient about it. This is why G.722.2 is preferred for cellphone use, where the cost of the bit rate is high, but another codec such as G.722.1 would be preferred if the application were broader and included multiple talkers, or music. Figure 1.21 shows how these different VoIP codecs stack up when comparing bandwidth to bit rate.

Figure 1.21: Audio bandwidth versus bit rate for different VoIP codecs.

Processor loading High-complexity codecs drive up the cost of a VoIP phone or endpoint because they require faster, more expensive processors and more memory. The issue multiplies with VoIP phones that perform multi-party bridged calls internally, which is a common VoIP-enabled feature today. Table 1.6 gives a couple of good example of how codecs can differ in their appetites for processor power. At 7 kHz, G.722.2 shows the highest demand for processor power, but we remember that its operation also results in the lowest bit rate. G.722.1, at one-seventh the processor power of G.722.2 and 40 percent the bit rate of G.722, is a good compromise. Comparing the two 20 kHz codecs, the difference in processor loading is striking, but surprisingly, there's no compensating advantage in bit rate or quality.

35

Table 1.6 MIPS versus audio bandwidth for some popular codecs.
BW (kHz) 7 7 7 20 20 Codec G.722.2 G.722 G.722.1 G.719 MPEG-4 AAC-LD MIPS 38 14 5.5 18 36

This may be because the MPEG codec adapts technology originally intended for media streaming and recording where G.719 was always targeted for VoIP and telecommunications, but it does form a good demonstration of how important differences can pop up. Audio Quality One reason that comparing codecs can get tricky is that audio quality, a somewhat subjective measure, is also tightly related to bit rate within a particular codec. One codec may tout extremely low bit rates, but a quick listening will reveal that the audio quality at those lowest bit rates is almost unusable. In this paper, I have tried to relate bit rates at comparable audio qualities, so the "typical" figures here will often be higher than the provider's "minimum" figures. But they are realistic, and appropriate for VoIP usage. There are standard objective measures of audio quality (MOS, PESQ, etc.), but if you are making a serious comparison of codecs, it's best to do a real "apples–to-apples" test and apply the same standard test track to all candidate algorithms, with each candidate running at its planned bit rate. Private and open-source codec suppliers will be glad to provide an algorithm simulator that runs on a standard PC, making it possible for you to do this test yourself. Give some thought to this test track. Even in VoIP applications, we're not always dealing with just the human voice. You often will find two or more voices speaking at the same time, or someone talking in a room with lots of reverberation: two situations that can really throw off a human voice tract codec (some CELP and ACELP implementations, for example, can be particularly sensitive to these things). Even a door closing or pencil dropping while someone is talking can come across very strangely, so it's good to have a full test in order to build a good comparison. Latency Latency is the time delay from when you say a word until the other person hears it, also referred to as the "mouth-to-ear" delay. When it gets too long,

36

conversations become difficult and stilted, with participants frequently but inadvertently interrupting each other and not understanding why. Twenty years ago, we often heard very long latencies on long-distance calls because of the widespread use of satellite links (even at the speed of light, a couple of hops off of satellites perched 22,500 miles above the earth add up to the better part of a second), but today long latencies are mostly the consequence of older videoconferencing systems or carelessly planned VoIP phone systems. A common recommendation is that one-way latency, which includes the codec, should not exceed 150 milliseconds. This is not a problem in a welldesigned system using telecom codecs such as G.722, G.722.1, etc. But occasionally a media or streaming codec will find its way into a VoIP system with disruptive results. Media codecs, such as those used to transmit streaming audio over the internet, are often not optimized for latency because one-way streaming connections are not sensitive to latency. Because they can insert an appreciable fraction of a second delay, they should be avoided in VoIP and telecommunication systems. Another contributor to latency in a VoIP system is a "jitter buffer." This is a kind of shock absorber for data flow that soaks up the momentary variations that occur in any IP network. These are sometimes embedded within a codec, which makes it important to be sure that multiple, redundant jitter buffers are not inadvertently built into a system (a jitter buffer can be 20 to 80 milliseconds or more). Cost As users, we usually do not see the cost of a codec, but cost can influence its selection into a phone system or a phone. There are license fees, or royalties, associated with some codecs; often, a per-year minimum fee, with a per-port or per-phone fee, and perhaps an initial fee as well. Some of the codecs we use today, such as G.722, are royalty-free because the underlying patents have expired. Some codecs, such as G.722.1 Annex C, are royalty-free because their vendor has decided that the industry is better served if high performance codecs are widely deployed. Other codecs, such as MPEG4 AAC-LD, still bear royalties. Royalties are not necessarily a bad thing, because codecs are often the result of long and expensive research resulting in valuable characteristics (such as low bit rate in G.722.2, which saves money in use). They are simply something to be aware of when considering your VoIP network deployment plans. Standardization and Availability The ITU is the de facto worldwide agency for standardization of telecommunications codecs. This is the industry organization that assigns the numbers to our familiar codecs; G.722.1's full name, in fact, is ITU-T G.722.1, because it is a product of the ITU-T Telecommunications Standards Sector, and like all ITU standards has been subjected to open, rigorous multi-vendor

37

evaluation before being accepted. While proprietary codecs may be incorporated in limited-use systems, it's of paramount importance that business VoIP telephony systems, which require worldwide interoperability and high reliability, be configured with ITU-approved codecs. The ITU sanction also ensures that codecs are available to all vendors on fair and reasonable terms. Furthermore, in the end of this section, we must to emphasize that here are two trends to keep in mind in VoIP audio bandwidth today; one is strategic, and one is technical. The strategic trend is this: VoIP telephony is moving toward full bandwidth 20 kHz sound, because the VoIP endpoint is undergoing transformation to a multi-purpose, multimedia device that integrates communications, applications, and even entertainment. As you have seen, there's little cost or bit rate penalty in going to wideband telephony using modern codecs (even the fullband G.719 codec has a lower bit rate than G.711), and competitive pressure will drive VoIP vendors to achieve full human compatibility in a very few years. Some applications will remain at the voice-friendly 7 kHz point due to tight cost or size constraints, but we'll see an increasing tide of fully capable 20 kHz VoIP systems with unified capabilities. The technical trend follows from the strategic: which are the codecs will bring us to this 20 kHz world? At 7 kHz, G.722 is mature, free, and already widely deployed in endpoints and in PBXs and softswitches. G.722.2 will be deployed in applications where its higher cost is offset by its very low bit rate and high quality, much of this driven by mobile phones. Its adoption there will push the network, and consequently wired endpoints, to follow. And finally, G.722.1 adds multimedia capability at less than half the bit rate of G.722, and one-seventh the processing cost of G.722.2. These three codecs form a functionally complete set for 7 kHz performance. The choice at 14 kHz is G.722.1 Annex C because of its maturity, modest bit rate and processing needs, and zero-cost license. And finally, 20 kHz performance in the VoIP world will come from the ITU's new G.719, as the likely successor to Siren22 (Siren22 is a principal predecessor of G.719, however).

38

1.6. Signaling for VoIP

The first hurdle to overcome when making a VoIP phone call is to establish a connection between the parties involved. In legacy telephony, this is done by switching circuits until a physical wire is established between locations. The Internet Protocol on the other hand is connectionless by nature. IP packets have a tendency to take whatever route they find first, and end up in whatever order they arrive. For time sensitive applications such as voice and video this is unacceptable. Steps must be taken to establish a point to point connection and to keep it open for the duration of the call. Similar to the handshake of the DHCP protocol, the VoIP signaling protocols use TCP to set up, manage and tear down the VoIP phone call. Signaling protocols are not concerned with the actual media stream of voice or video, and could care less about QoS and traffic engineering. Their basic functions are to first initiate a session, then to find common ground for communication between the parties involved, and to terminate the session at calls end. In the following the most used signaling protocols for VoIP (H323, SIP and MGCP) are presented.

1.6.1. H.323
Derived from related specifications for multimedia conferencing over ISDN, H.323 defines a protocol architectural framework (see Figure 1.22) that encompasses the ability to use it in both direct-routed and server-routed signalling modes. Within this architecture, the server-routed signalling mode is known as ‘gatekeeper routed’ signalling due to the term used within the H.323 specifications to describe the server component.

Figure 1.22. Protocol Architecture for H323.

39

H.323 is actually an umbrella standard, encompassing several other protocols, including H.225, H.245, and others. It acts as a wrapper for a suite of media control recommendations by the ITU. Each of these protocols has a specific role in the call setup process, and all but one are made to dynamic ports. Figure 1.23 shows the basic H.323 architecture and Figure 1.24 provides an overview of the H.323 typical registration and call set-up process.

Figure 1.23. Basic Architecture of H323.

An H.323 network is made up of several endpoints (terminals), a gateway, and possibly a gatekeeper, Multipoint control unit, and Back End Service. The gatekeeper is often one of the main components in H.323 systems. It provides address resolution and bandwidth control. The gateway serves as a bridge between the H.323 network and the outside world of (possibly) non-H.323 devices. This includes SIP networks and traditional PSTN networks. This brokering can add to delays in VOIP, and hence there has been a movement towards the consolidation of at least the two major VoIP protocols. A Multipoint Control Unit is an optional element that facilitates multipoint conferencing and other communications between more than two endpoints. Gatekeepers are an optional but widely used component of a VoIP network. If a gatekeeper is present, a Back End Service (BES) may exist to maintain data about endpoints, including their permissions, services, and configuration. Currently in version 2, H.323 is a standard recommended by the Telecommunication Sector of the ITU. It defines real-time multimedia communications and conferencing over packet-based networks that do not provide a guaranteed Quality of Service (QoS) such as the LAN and the Internet. As we mentioned before, it is an “umbrella standard” belonging to the H.32x class of standards recommended by the ITU for videoconferencing applications.

40

Figure 1.24. H.323 typical registration and call set-up process

These were amongst the earliest standards to classify and provide solutions to VoIP (given in section 1.1): H.310 for conferencing over Broadband ISDN (B-ISDN); H.320 for conferencing over Narrowband ISDN; H.321 for conferencing over ATM; H.322 for conferencing over LANs with guaranteed QoS; H.324 for conferencing over Public Switched Telephone Networks. Earlier versions of H.323 had a large overhead in control signalling, particularly when establishing a session. This has presented some scalability limitations, especially when a large number of simultaneous sessions are presented. Subsequent version have focussed on addressing these issues. However, H.323 is an immensely powerful technology, incorporating many features that can be switched on or off depending upon the network deployment context. It is by the careful choice of these options and appropriate design of gatekeeper-based applications to route signalling messages that H.323 networks may be scaled to very large dimensions. As can be seen in Figure 1.24; the calling client initially engages in a registration sequence to identify it to the network. This type of behaviour is an essential feature of a VoIP system and

41

effectively provides a degree of inherent terminal and user mobility functionality since the client may register from anywhere in the connected IP network.

1.6.2. SIP
In contrast to H.323, the IETF has been developing a competing, but potentially complementary, architecture for multiparty, multimedia conferencing on the Internet as is shown in Figure 1.25. The session initiation protocol (SIP) is a component of this architecture and provides the basic session control mechanism used within it. The SIP protocol has gained a substantial following within the industry by offering the potential for an easily implemented method of establishing and controlling basic voice calls. From its very lightweight inception, SIP has been developed to address the challenges of being used outside basic point-to-point voice calls and an overly simplistic direct-mode signaling model. As can be seen by contrasting the same simple call flows using H.323 (Figure 1.24) with those using SIP (Figure 1.26), the differences between the technologies are not always as obvious as some may wish them to be, which belies the true roles each should be able to play.

Figure 1.25. IETF multimedia conferencing architecture.

Moreover, the architecture of a SIP network is different from the H.323 structure. A SIP network is made up of end points, a proxy and/or redirect server, location server, and registrar. A diagram is provided in Figure 1.27. In the SIP model, a user is not bound to a specific host (neither is this the case in H.323, gatekeeper provides address resolution). The user initially reports their location to a registrar, which may be integrated into a proxy or redirect server. This information is in turn stored in the external location server. Furthermore, the messages from endpoints must be routed through either a proxy or redirect server. The proxy server intercepts messages from endpoints

42

or other services, inspects their “To:” field, contacts the location server to resolve the username into an address and forwards the message along to the appropriate end point or another server. Redirect servers perform the same resolution functionality, but the onus is placed on the end points to perform the actual transmission. That is, Redirect servers obtain the actual address of the destination from the location server and return this information to the original sender, which then must send its message directly to this resolved address (similar to H.323 direct routed calls with gatekeeper). It is more then obviously that SIP protocol itself is modeled on the threeway handshake method implemented in TCP (see Figure 1.26).

Figure 1.26. Typical registration and call set-up using SIP.

The main advantages of SIP consists of its offering an easily implemented, powerful, control environment capable of scaling to very large networks due to its simple message request/response format. This, combined with its relative immaturity compared with H.323, encouraged its adoption in the access segment of third generation networks, since this affords the opportunity to incorporate any mobile-specific elements that were subsequently identified. On the other hand, both protocols can be extended to manage new capabilities. The argument has been advanced that H.323 is more stable because of its maturity but SIP provides better support for some functionality and is easier to implement. Fortunately the ITU and the IETF are now co-operating in developing standards in this area.

43

Figure 1.27. Typical SIP Architecture.

Moreover, in Figure 1.28 a simplified illustration of a call between VoIP SIP phones within the same SIP IP telephony network is given. When calls are made within a single SIP IP telephony network, the process typically involves the origination and destination phones and a single proxy server.

Figure 1.28. Calls Within a Single SIP VoIP Network

In this illustration, the following sequence occurs: 1. Cisco SIP IP phone A initiates a call by sending an INVITE message to the SIP proxy server. (There can be more than one proxy server for redundancy.)

44

2. The SIP proxy server interacts with the location server and possibly with application services to determine user addressing, location, or features. 3. The SIP proxy server then proxies the INVITE message to the destination phone. 4. Responses and acknowledgments are exchanged, and an RTP session is established between Cisco SIP IP phones A and B. When calls are made between SIP VoIP networks, the process typically involves the origination and destination phones as well as two or more SIP proxy servers. Figure 1.29 is a simplified illustration of a call between SIP VoIP phones in different SIP VoIP networks.

Figure 1.29 Calls Between SIP IP Telephony Networks

In this illustration, the following sequence occurs: 1. Cisco SIP IP phone A initiates a call by sending an INVITE to the SIP proxy server. (There can be more than one proxy server for redundancy.) 2. The SIP proxy server might interact with application services such as RADIUS to obtain additional information. 3. The SIP proxy server in phone A's network contacts the SIP proxy server in phone B's network. The local proxy uses the domain name system (DNS) domain to determine if it should handle the call or route it to another proxy. The remote proxy is contacted based on the domain of the destination device. 4. The SIP proxy server in phone B's network might interact with application services to obtain additional information.

45

5. The SIP proxy server in phone B's network contacts the destination phone (Cisco SIP IP phone B). 6. Responses and acknowledgments are exchanged, and an RTP session is established between Cisco SIP IP phones A and B. Moreover, SIP 200 OK, 180 Ringing, and 183 Session Progress messages pass through the same set of proxies, for they are in the same call sequence. SIP CANCEL or BYE requests sent by a terminating user agent might or might not pass through the same set of proxies. Furthermore, when calls are made between a SIP VoIP network and a traditional telephony network, the process typically involves the origination phone, one or more proxy servers, a gateway, and a PBX or PSTN device. Figure 1.30 is a simplified illustration of a call between a Cisco SIP IP phone and a traditional phone in a traditional PSTN.

Figure 1.30 Calls Between a SIP VoIP Network and a Traditional Telephony Network In this illustration, the following occurs: 1. Cisco SIP IP phone A initiates a call by sending an INVITE to the SIP proxy server. (There can be more than one proxy server for redundancy.) 2. The SIP proxy server might interact with application services such as RADIUS to obtain additional information. 3. The SIP proxy server proxies the INVITE to the Cisco SIP gateway. 4. The Cisco SIP gateway establishes communication with the traditional telephony network, in this case a PBX.

46

5. Responses and acknowledgments are exchanged, and an RTP session is established between Cisco SIP IP phone and the Cisco SIP gateway. The signaling on the plain-old-telephone-service (POTS) side of the gateway is translated into SIP messages on the IP network to provide proper ringback signaling to the end-user phones. Finally, Table 1.7 lists various VoIP services that are available with the Cisco VoIP Infrastructure Solution for SIP. Table 1.7 Services of the VoIP Infrastructure Solution for SIP (given by Cisco).
Service Direct dialing based on digit dialing Description Allows users to initiate or receive a call using a standard E.164 number format in a local, national, or international format. Direct dialing based on email address Allows users to initiate or receive a call using an email address instead of a phone number. Private network dialing plan support Allows administrators to implement private feature sets. The features allow for both originations and terminations from either the IP network or existing PSTN networks. Direct inward dialing Allows users from outside the SIP IP telephony network to dial a Cisco SIP IP phone number directly. Direct outward dialing Allows users within the SIP IP telephony network to obtain an outside line (for placing a call to a number outside the system) without the aid of a system attendant. This is typically accomplished by dialing a prefix number such as 8 or 9. Consultation hold Allows users to place a call from another user on hold. Call forward network (unconditional, busy, Allows users to have the network forward and no answer) calls. The user can request that all calls be forwarded (unconditional) or that only unanswered calls (busy or no answer) be forwarded. Do not disturb Allows the user to instruct the system to intercept incoming calls during specified periods of time when the user does not want to be disturbed. Three-way calling Allows a user to receive a call and then add another user to the call. For example, user B receives a call from user A. User B then places user A on hold, contacts user C, and then reinstates the session with user A so that all three can participate in the call. User B acts as the bridge.

47

Call transfer with consultation (attended)

Allows users to transfer a call to another user. The transferring user places the other user on hold and calls the new number (equivalent to consultation hold). If the call is answered, the user can notify the new third user before the call is transferred. Call transfer without consultation transfer Allows users to transfer a call to another (unattended) user. The transferring user transfers the call to the new user without first contacting the third user. Call waiting Provides an audible tone to indicate that an incoming call is waiting. The user can then decide to terminate the existing call and take the new one or to route the unanswered Call Waiting call to another destination. Multiple directory numbers Allows an multiple directory numbers to be logically assigned to a terminal. Caller ID blocking Allows the user to instruct the system to block their phone number or email address from phones that have caller identification capabilities. Anonymous call blocking Allows the user to instruct the system to block any calls for which the identification is blocked. Message Waiting Indication (via unsolicited Lights to indicate that a new voice NOTIFY) message is in a subscriber's mailbox. If the subscriber listens to the message but does not save or delete the message, the light remains on. If a subscriber listens to the new message or messages, and saves or deletes them, the light goes off. The message waiting indicator is controlled by the voice-mail server.

1.6.3. Media Gateway Control Protocol (MGCP)
The Media Gateway Control Protocol (MGCP) facilitates communication between components of a “decomposed” VoIP media gateway controller (MGC). The MGC appears as a single logical VoIP gateway with at least three distributed components – a call agent (CA), at least one media gateway (MG) for converting media signals between circuits and packets, and at least one signaling gateway (SG) used for linking to the public switched telephone network (PSTN). The MGCP is used among components of this system. One MGC can control multiple MGs, offering substantial cost reductions for deployment of large VoIP systems. The MGCP was published in 1999 as an “Informational” document by the IETF in RFC 2705, which was updated in 2003 by RFC 3435 (also Informational). A “standards-track” IETF protocol for the same purpose was Megaco, defined in 48

2000 by RFC 3015; that was replaced in 2003 by the Gateway Control Protocol, version 1.0 (RFC 3525). The MGCP remains the de facto industry standard. Figure 1.31 shows the relationship of the MGCP to other VoIP system components.

Figure 1.31 General Scenario for MGCP development.
Source: “Security Considerations for Voice Over IP Systems,” NIST

Initially, MGCP was a combination of Simple Gateway Control Protocol (SGCP) and the Internet Protocol Device Control (IPDC) published as informational RFC2705 (mentioned before). The MGCP has been widely deployed and is maintained under the auspices of the Softswitch Consortium and PacketCable. Currently, the PacketCable profile of MGCP is being standardized in the ITU Study Group 9. The drive behind the development of Megaco/H.248 was the need to provide various requirements that were not addressed properly by MGCP. Conceptually, Megaco/H.248 is an evolution of MGCP. However the implementation is different and they are not directly compatible. Megaco/H.248 addresses the same types of applications as MGCP but in a more generic and elegant way. Megaco/H.248 is a collaborative effort of the ITU and IETF, following an agreement by both bodies to cooperate on a single unified protocol. Megaco is published by the IETF as RFC3015. H.248 is published by ITU Study Group 16. The IETF and ITU-T compromised by agreeing to accept two syntaxes with the same semantics—one in text and the other in binary (ASN.1) format. In practice, the text format is much more popular than the binary format. Both protocols are identical (except for the boilerplate). The Megaco/H.248 protocol is the most recent of the VoIP protocols to be ratified2. However, there is still much work to be done. At the time of writing this White Paper, a number of annexes to Megaco/H.248 that describe various packages are in different stages of standardization. The Megaco MIB is still being defined in the IETF, while Annex L of H.323 specifies the tunneling of Megaco/H.248 over H.323 for stimulus-based signaling. The Megaco working group in IETF has recently been re-chartered to work on a second generation of Megaco (version 2). The following sections describe the fundamental entities of the MGCP and Megaco/H.248 models. Moreover, MGCP describes a call control architecture, where the intelligence of the call control is outside the gateways and handled by external 49

call control elements. The MGCP assumes that these call control elements will synchronize with each other by sending coherent commands to the gateways under their control. MGCP is a master/slave protocol where the gateways are expected to execute commands sent by the call control elements. MGCP does not define a mechanism for synchronizing call control elements. MGCP assumes a Connection model where the basic constructs are Endpoints and Connections (see Figure 1.32).

Figure 1.32: MGCP Endpoints and Connections

An Endpoint is a logical representation of a physical entity, such as an analog phone or a channel in a trunk. Endpoints are sources or sinks of data and can be physical or virtual. Physical Endpoint creation requires hardware installation while software is sufficient for creating a virtual Endpoint. An interface on a gateway that terminates a trunk connected to a PSTN switch is an example of a physical Endpoint. An audio source in an audio-content server is an example of a virtual Endpoint. For example, when you tell an Endpoint to “ring”, the Endpoint makes the analog phone actually ring. Or when someone picks up the receiver of an analog phone and it goes “off hook”, a Media Gateway will recognize that an event has occurred at the Endpoint and it will behave appropriately. In the Trunking Gateway described on page above, the bearer channel is an Endpoint. Events and signals occur at Endpoints. A phone ringing is an event, while a phone off the hook is a signal. In Figure 1.32, the Endpoints are A, B, C and D. The MGC knows four objects: A@MG1, B@MG1, C@MG2, D@MG2. Each MG knows two objects: MG1 knows about A and B, and MG2 knows about C and D. When an event occurs at the physical phone, the Endpoint object of the phone in the MG recognizes that an event has occurred. The MG notifies the object of a particular Endpoint in the MGC. The MGC then acts accordingly and changes the state. Moreover, an Endpoint holds a set of Connections. Connections may be either point-to-point or multipoint. A point-to-point Connection associates two Endpoints. Once this association is established for both Endpoints, data transfer between these Endpoints can begin. A multipoint Connection is established by connecting the Endpoint to a multipoint session. Connections can be established over several types of bearer networks—audio packet transmission using RTP

50

and UDP over a TCP/IP network; audio packet transmission using AAL2 or another adaptation layer over an ATM network; and transmission of packets over an internal Connection. The Endpoints can be in separate gateways or in the same gateway for both point-to-point and multipoint Connections. For more details about Media Gateway Control Protocol (MGCP) Version 1.0 see in: http://www.javvin.com/protocol/rfc3435.pdf .

51

References
[1] ITU-T Rec. Y.2001, ‘General overview of NGN’. [2] ITU-T Rec. Y.2211, ‘IMS-based real-time conversational multimedia services over NGN’. [3] ITU-T Rec. Y.2011, ‘General principles and general reference model for nextgeneration networks'. [4] ITU-T Rec. Y.2237, ‘Functional model and service scenarios for QoS enabled mobile VoIP service’. [5] ITU-T H323, Packet-based multimedia communications systems [6] RFC 4168, The Stream Control Transmission Protocol (SCTP) as a Transport for the Session Initiation Protocol (SIP), IETF [7] RFC 3261, SIP: Session Initiation Protocol, IETF, 2002. [8] ITU-T NGN – GSI Release 1 ‘NGN_FG-book II’. [9] "Implementing Media Gateway Control Protocols", RADVision, 2002. [10] ITU-T Rec. Y2012: ‘Functional requirements and architecture of the NGN’. [11] 3GPP TR 23.882: ‘3GPP system architecture evolution, report on technical options and conclusions’. [12] IETF RFC 3095, 2001. [13] ITU-T Rec. Y.1901, “Requirements for the support of IPTV services”. [14] Y.1900-Y.1999: IPTV over NGN. [15] Y.2250-Y.2299 : Service aspects: Interoperability of services and networks in NGN. [16] ITU-T Rec. Y.1910, “IPTV functional architecture”. [17] ITU-T Rec. H.720, “Overview of IPTV terminal devices and end systems”. [18] ITU-T Rec. H.721, “IPTV Terminal devices: Basic model”. [19] ITU-T Rec. X.1191, “Functional requirements and architecture for IPTV security aspects”. [20] ETSI ES 282 001 version 1.1.1: ‘Protocols for advanced networking (TISPAN); NGN Functional Architecture Release 1’. [21] ETSI: ‘Long term evolution of the 3GPP radio technology’ and ‘System architecture evolution’. [22] ETSI TS 122 228: "Digital cellular telecommunications system (Phase 2+); Universal Mobile Telecommunications System (UMTS); Service requirements for the Internet Protocol (IP) multimedia core network subsystem (IMS); Stage 1 (3GPP TS 22.228 Release 6)". 52

[23] ITU-T Rec. Y.2216, ‘NGN capability requirements to support the multimedia communication centre service’. [24] ITU-T Rec. Y.2205, ‘Next Generation telecommunications – Technical considerations’. Networks – Emergency

53

Module 1 - VoIP Fundamentals

Comments

Content

Sponsor Documents

Recommended