- VoIP stands for Voice over Internet Protocol and carries calls as data packets rather than over a dedicated voice circuit.
- Session Initiation Protocol (SIP) is the signalling standard most commonly used to set up, manage and end VoIP calls.
- The G.711 codec defined by the ITU encodes voice at 64 kilobits per second per direction without compression.
- The G.729 codec compresses voice to around 8 kilobits per second, using far less bandwidth than G.711.
- Jitter, packet loss and insufficient bandwidth are the main causes of degraded VoIP call quality.
VoIP converts speech into digital packets, uses SIP to set up the call and a codec such as G.711 to encode the audio, then routes the packets over the internet to the other party in near real time.
Last reviewed: June 2026
What VoIP Is
Every time a call is placed over a broadband line rather than a traditional copper circuit, Voice over Internet Protocol is doing the work in the background. VoIP is the family of technologies that lets the human voice travel across the same internet that carries web pages and video. Rather than holding open a dedicated electrical path between two telephones for the duration of a call, VoIP breaks the conversation into small digital packets and sends them across a shared network, reassembling them at the far end fast enough that the two people hear each other in something very close to real time.
This matters because the United Kingdom is in the middle of retiring its analogue telephone network. Openreach is moving every line onto an internet protocol platform, with the all-IP migration scheduled to complete in 2027, which means VoIP is becoming the default rather than a specialist alternative. Understanding how it works helps explain why call quality depends on the broadband connection and why a VoIP line behaves differently from the old copper service.
How Audio Becomes Digital Packets
A telephone call starts as sound: pressure waves that a microphone turns into a continuously varying electrical signal. VoIP cannot send that continuous signal directly, so it samples the audio many times per second and converts each sample into a number, a process known as analogue-to-digital conversion. The stream of numbers is then grouped into packets, each carrying a small slice of the conversation along with addressing information that tells the network where the packet should go.
Those packets travel independently across the internet and may even take different routes to reach the destination. At the far end the receiving device places them back in order, converts the numbers back into an electrical signal and drives a speaker so the listener hears the original voice. Because each packet is timestamped, the receiver can detect when packets arrive late or out of sequence, which is central to keeping the conversation intelligible. The whole cycle of capture, packetise, transmit and reassemble happens continuously throughout the call.
The Role of SIP
Sending audio packets is only half of a telephone call. Something has to set the call up, make the other phone ring, agree how the audio will be encoded and tear the call down at the end. That signalling job is most commonly handled by the Session Initiation Protocol, usually shortened to SIP. SIP is the language two systems use to say who is calling whom, to negotiate the parameters of the session and to manage events such as answering, holding, transferring and hanging up.
It helps to separate the two streams. SIP carries the control messages, the equivalent of dialling and ringing, while the actual voice travels in a separate media stream once the call is connected. This separation is why a VoIP system can advertise which codecs it supports during call setup and then settle on one both ends understand before any speech is exchanged. SIP also underpins SIP trunking, the method by which businesses connect their telephone systems to the wider network over IP.
Codecs: G.711, G.729 and the Trade-Offs
A codec is the component that encodes the sampled audio into digital form and decodes it again at the other end, and the choice of codec shapes both quality and bandwidth. The G.711 codec, standardised by the International Telecommunication Union, encodes voice at 64 kilobits per second per direction without compression, which delivers clear audio at the cost of higher bandwidth. The G.729 codec compresses the same voice to roughly 8 kilobits per second, using far less of the connection but applying more processing to do so.
The practical trade-off is straightforward. Where bandwidth is plentiful, an uncompressed codec such as G.711 keeps the audio faithful. Where bandwidth is constrained, a compressed codec such as G.729 fits more simultaneous calls into the same link at the price of slightly more aggressive encoding. The table below sets out the main technical components of a VoIP call and what each one does.
| Component | Role in a VoIP call |
|---|---|
| SIP | Sets up, manages and ends the call session |
| G.711 codec | Encodes voice at 64 kbit/s per direction, no compression |
| G.729 codec | Compresses voice to around 8 kbit/s to save bandwidth |
| Media stream | Carries the encoded audio packets between parties |
| Jitter buffer | Smooths out packets that arrive at uneven intervals |
How Calls Are Routed
Once SIP has agreed the session and a codec has been chosen, the audio packets are addressed and pushed onto the network. They pass through the home or office router, across the broadband provider's network and on through the internet or a provider's voice network until they reach the destination. The packets carry the addressing needed for each network device to forward them towards their target, in the same way that any other internet traffic is routed, which is why VoIP does not need a dedicated circuit reserved end to end.
Because the packets share the network with everything else, the route they take and the conditions along it can vary moment to moment. A jitter buffer at the receiving end holds incoming packets briefly so that small variations in arrival time can be smoothed before the audio is played back. This buffering is a deliberate compromise: a little delay is traded for steadier, more intelligible sound. The receiver also handles packets that arrive out of order or not at all, which leads directly to the factors that determine quality.
What Affects Quality
Three technical factors dominate VoIP call quality. Jitter is the variation in the time between packets arriving; high jitter makes audio sound choppy because the receiver struggles to reassemble a steady stream. Packet loss occurs when packets fail to arrive at all, producing gaps or clipped words. Insufficient bandwidth, where the connection cannot carry the call alongside other traffic, forces packets to queue or be dropped and degrades the conversation. Latency, the overall delay end to end, can also make a call feel awkward even when the audio itself is clear.
These factors explain why a VoIP line is only as good as the connection beneath it. Prioritising voice traffic on the local network, ensuring enough headroom in the broadband link and using a wired rather than congested wireless path all help keep jitter and loss low. The codec choice interacts with this too, since a compressed codec needs less bandwidth but leaves less margin for error if the connection is already strained.
Frequently Asked Questions
What is VoIP?
VoIP stands for Voice over Internet Protocol, a family of technologies that carries telephone calls as digital data packets over the internet rather than over a dedicated analogue circuit. The voice is digitised, packetised, transmitted and reassembled at the far end in near real time. It is becoming the standard as the UK retires its analogue network by 2027.
How does a VoIP call travel from one phone to another?
The speaker's voice is sampled and converted into digital packets, which are addressed and sent across the router, the broadband network and the wider internet to the destination. Each packet may take its own route and is reassembled in order at the far end. A jitter buffer smooths arrival times before the audio is played back.
What is SIP?
SIP, the Session Initiation Protocol, is the signalling standard most commonly used to set up, manage and end VoIP calls. It handles the control messages, such as making the phone ring and negotiating the codec, while the voice itself travels in a separate media stream. SIP also underpins SIP trunking used by businesses.
What bandwidth does VoIP need?
The bandwidth depends on the codec: the uncompressed G.711 codec uses 64 kilobits per second per direction, while the compressed G.729 codec uses around 8 kilobits per second. Additional overhead from packet headers adds to these figures in practice. Enough headroom is needed so voice traffic is not squeezed by other use of the connection.
What causes poor VoIP call quality?
The main causes are jitter, the uneven arrival of packets, packet loss, where packets fail to arrive, and insufficient bandwidth on the connection. High latency can also make a call feel awkward even when the audio is otherwise clear. Prioritising voice traffic and ensuring a stable, uncongested connection help keep quality high.