VoIP QoS Issues:
The advantages of reduced cost and bandwidth savings of carrying voice-over-packet networks are associated with some quality-of-service (QoS) issues unique to packet networks.
Delay.
Delay causes two problems: echo and talker overlap. Echo is caused by the signal reflections of the speaker's voice from the far-end telephone equipment back into the speaker's ear. Echo becomes a significant problem when the round-trip delay becomes greater than 50 milliseconds. As echo is perceived as a significant quality problem, voice-over-packet systems must address the need for echo control and implement some means of echo cancellation.
Talker overlap (or the problem of one talker stepping on the other talker's speech) becomes significant if the one-way delay becomes greater than 250 milliseconds. The end-to-end delay budget is therefore the major constraint and driving requirement for reducing delay through a packet network.
The following are sources of delay in an end-to-end, voice-over-packet call:
Accumulation Delay (Sometimes Called Algorithmic Delay)
This delay is caused by the need to collect a frame of voice samples to be processed by the voice coder. It is related to the type of voice coder used and varies from a single sample time (.125 microseconds) to many milliseconds. A representative list of standard voice coders and their frame times follows:
  • G.726 adaptive differential pulse-code modulation (ADPCM) (16, 24, 32, 40 kbps)—0.125 microsecond
  • G.728 LD–code excited linear prediction (CELP)(16 kbps)—2.5 milliseconds
  • G.729 CS–ACELP (8 kbps)—10 milliseconds
  • G.723.1 Multirate Coder (5.3, 6.3 kbps)—30 millisecond
Processing Delay
This delay is caused by the actual process of encoding and collecting the encoded samples into a packet for transmission over the packet network. The encoding delay is a function of both the processor execution time and the type of algorithm used. Often, multiple voice-coder frames will be collected in a single packet to reduce the packet network overhead. For example, three frames of G.729 code words, equaling 30 milliseconds of speech, may be collected and packed into a single packet.
Network Delay
This delay is caused by the physical medium and protocols used to transmit the voice data and by the buffers used to remove packet jitter on the receive side. Network delay is a function of the capacity of the links in the network and the processing that occurs as the packets transit the network. The jitter buffers add delay, which is used to remove the packet-delay variation to which each packet is subjected as it transits the packet network. This delay can be a significant part of the overall delay, as packet-delay variations can be as high as 70 to 100 milliseconds in some frame-relay and IP networks.
Jitter
The delay problem is compounded by the need to remove jitter, a variable interpacket timing caused by the network a packet traverses. Removing jitter requires collecting packets and holding them long enough to allow the slowest packets to arrive in time to be played in the correct sequence. This causes additional delay. The two conflicting goals of minimizing delay and removing jitter have engendered various schemes to adapt the jitter buffer size to match the time-varying requirements of network jitter removal. This adaptation has the explicit goal of minimizing the size and delay of the jitter buffer, while at the same time preventing buffer underflow caused by jitter. Two approaches to adapting the jitter buffer size are detailed below. The approach selected will depend on the type of network the packets are traversing. The first approach is to measure the variation of packet level in the jitter buffer over a period of time and incrementally adapt the buffer size to match the calculated jitter. This approach works best with networks that provide a consistent jitter performance over time, such as ATM networks. The second approach is to count the number of packets that arrive late and create a ratio of these packets to the number of packets that are successfully processed. This ratio is then used to adjust the jitter buffer to target a predetermined, allowable late-packet ratio. This approach works best with the networks with highly variable packet-interarrival intervals—such as IP networks. In addition to the techniques described, the network must be configured and managed to provide minimal delay and jitter, enabling a consistent QoS.