Alex Kreidler

ProjectsBooksBlog

Building a virtual phone: VoIP, SIP, modems, and the AT protocol

Oct 14, 2023

Let’s say you want to get a VoIP (Voice over IP) phone that can make and receive calls to regular 10-digit telephone numbers on the Public Switched Telephone Network (PSTN). Why? Maybe because you want advanced conference calling, call recording, or phone menu features, or you just want to be able to call people from your laptop or other devices.

You’ll need the Session Initiation Protocol (SIP), a protocol somewhat similar to HTTP but for voice and video calls over the Internet, to connect your client (software that allows you to dial a number and use the audio input/output from your computer to make the call) to a gateway which translates messages received via SIP to real calls on the PSTN.

Now how can we do this in the cheapest way possible? There are a few options: purchasing a SIP trunking service (from a company that runs a SIP server that connects through their PSTN gateway, managing it for you), which usually costs around 1 cent per minute. Or you can try some of the open-source software that allows you to create a SIP gateway out of a USB modem. Finally, you could build it yourself by connecting SIP to a cellular modem with a SIM card.

Option 1 (buying):

You pay for each number you reserve, plus per call minute. I’ve only included prices for outbound calls.

https://voip.ms/residential/pricing

Calling: $0.01/min => 0.60/hour

Number: $0.40 setup + $0.85/month

https://sonetel.com/en/prices/international-calls/

Calling: $0.011/min => 0.66/hour

Number: $2/month

They also have a $14/month premium plan that comes with 1 free number reserved. You’d need to spend more than 18 hours calling to make it worthwhile.

https://www.plivo.com/sip-trunking/pricing/us/

Calling: $0.0065/min => 0.39/hour

Number: $0.5/month

Option 2 (use SIP software with modem support):

Chan_dongle is a plugin for Asterisk that supports Huawei modems. Chan_dongle_extended is a fork that has worked on fixing some bugs. I haven’t tried any of these personally.

Asterisk is a large piece of software that I’ve heard can be complex to configure and get running, so I haven’t tried this.

Option 3 (building):

You buy a modem that supports a SIM card or eSIM, buy the card itself, and write the code to connect SIP to the modem. Computers communicate with modems via the AT serial protocol which describes basic commands. They often use other proprietary protocols for data/network connections, but just for seting up voice calling AT is sufficient. Modems then use GSM, GPRS, and now LTE standards developed by the 3GPP partnership to communicate with cell towers. The main problem is most modems are very poorly documented and have subtly varying edge-cases between models. You may need to dig through datasheets like this or this or this one that has a nice walk-through.

If you got an eSIM that supports voice calling (which there are few of), and one that was very cheap, like $3/7days, then you might be able to run a SIP gateway for cheaper than the 3rd-party services listed above in Option 1.

$3/7days is 0.42/day = 0.01478/hour = 0.0002976/min, or about 1/26th of Plivo’s rate.

But that’s assuming you make calls 24/7, which you probably wouldn’t, and it excludes the fixed cost of the modem hardware and server, like a Raspberry Pi. Also, you’d need to set up eSIM profile management, which can be complex — Ubuntu’s modem-manager currently doesn’t support that.

There are some cool libraries that could help you implement a SIP gateway, like pyVoIP. Github issues indicate it has pretty bad sound quality and a fair number of bugs.

Viska is a WIP rust library for SIP. It’s not ready for production use yet, but it’s cool to see someone working on this.

SIP and AT protocol example

The SIP protocol can be very complex, supporting multiple users and streaming formats. Here’s an example. Let’s say Alice wants to call Bob.

INVITE sip:[email protected] SIP/2.0
Via: SIP/2.0/UDP alicepc.example.com:5060
Max-Forwards: 70
From: <sip:[email protected]>;tag=1234
To: <sip:[email protected]>
Call-ID: 5678
CSeq: 1 INVITE
Contact: <sip:[email protected]>
Content-Type: application/sdp
Content-Length: 147

v=0
o=alice 2890844526 2890844526 IN IP4 alicepc.example.com
s=-
c=IN IP4 alicepc.example.com
t=0 0
m=audio 49170 RTP/AVP 0
a=rtpmap:0 PCMU/8000

The SDP body contains the session description:

  • v: Protocol version.
  • o: Originator of the session (Alice), including the session ID, version, network type, and address.
  • s: Session name or description.
  • c: Connection data, including the network type and address.
  • t: Timing information, indicating the start and stop time of the session.
  • m: Media information, such as audio, specifying the port number, transport protocol, and payload type.
  • a: Attributes, like the codec used (PCMU) and its clock rate (8000 Hz).

This SIP session initiates a call from Alice to Bob, and the SDP provides details about the media parameters and capabilities for the call.

SIP Command Purpose AT Command Purpose
INVITE Initiates a session setup request. ATD (Dial) Initiates a call
BYE Terminates a session. ATH (Hang-up) Hangs up an active call

Conclusion

Overall, I realized that the cost of the modem hardware and SIM card, along with the time and complexity of building a simpler SIP gateway (even with some promising libraries) are too significant.

It’s much easier just to connect one of the many great SIP clients to a solid service like Plivo so I only pay for the minutes I use.

But it’s been a fun journey diving all the way from the user experience and networking layer of voice calls to the hardware, and trying to think about how to improve the system, for a cheaper and better calling experience.