---
title: "Building a Self-Hosted WhatsApp Gateway"
description: "Why I split a WhatsApp messaging gateway across a cloud front door and a Mac at home, and how the two halves stay in sync, stay secure, and confirm every send."
date: 2026-06-18
category: Engineering
readingTime: "7 min read"
---


Plenty of HMD's software needs to send a WhatsApp message at some point: a delivery confirmation, an internal alert, a transactional notice. The easy route is to pay a third-party provider and forget about it. The catch is that your messages, your data, and your reliability then live in someone else's system, and you pay per message forever.

So I built our own. This is a post about how it is put together and why I made the choices I did. A word of honesty up front: it rides on [Baileys](https://github.com/WhiskeySockets/Baileys), an open-source, community-maintained WhatsApp client. It is not an official Meta product and I would never describe it as one. It is a tool we built ourselves on open components, and the interesting part is not the idea, it is the engineering around it.

## Two halves that do one job

The system is deliberately split in two.

The **gateway** is the front door. It lives in the cloud on Vercel, it is always reachable, and it is what the rest of our software talks to. It validates requests, persists messages, runs the admin dashboard, and enforces every security rule. It does not talk to WhatsApp directly.

The **worker** is the engine room. It is the only part that holds the live WhatsApp connection, and it runs on a dedicated Mac at a fixed location. It receives commands, executes them, and reports back.

Why split it at all? Because the two jobs have opposite needs. The front door has to be always-on and reachable from anywhere, which is exactly what [serverless infrastructure](/blog/nextjs-15-performance-patterns) is good at. The connection, by contrast, wants one stable, long-lived session from a consistent machine, which is exactly what serverless is bad at. Forcing both onto one box would compromise both. Separating them lets each half be the best version of itself, and it tucks the sensitive part, the live session, away on our own hardware rather than on the public internet.

The two halves never hold an open connection to each other, and that is deliberate. Commands travel from the gateway to the worker through a managed Redis instance: the gateway drops a signed command onto a queue, and the worker, which is listening, picks it up. Replies and incoming events travel back the other way, over ordinary outbound HTTPS requests that the worker makes to the gateway. The asymmetry is the point. Every connection the worker makes is **outbound**; it dials Redis and the gateway, and nothing on the public internet can dial it. The only port it opens at all is bound to the local machine, for a small admin screen, and is unreachable from the network. There is no public door to knock on.

## The worker looks after itself

A connection that needs babysitting is not a system, it is a chore. The worker is built so that nobody has to remember it exists.

It is compiled into a single self-contained binary for Apple silicon and registered as a user-level background agent through launchd, macOS's own service manager. launchd starts it on login and respawns it if it ever crashes. Session credentials are encrypted at rest, with the key held in the macOS Keychain, so a stolen disk image is not a stolen account. When the socket drops, it reconnects on its own; when configuration changes on the gateway, it picks that up on a short polling loop without needing a redeploy. The design goal was simple: it should survive a reboot, a crash, and a bad network without a human in the loop.

## Making a send actually mean something

This is the part I am most pleased with.

Most messaging code is fire-and-forget. You hand a message to the provider, you get back "accepted", and you move on. But "accepted into a queue" and "delivered" are two different claims, and the gap between them is where the confusing bugs live. I wanted a caller to be able to ask for a real answer: did this message actually go out, yes or no?

That turns out to be harder than it sounds, because the answer has to travel back across every layer: from Baileys on the Mac, to the worker, onto the queue, back to the gateway, and out to the original caller, all while an HTTP request is still open and waiting. The flow now works like this:

1. A caller posts a message and asks for confirmation.
2. The gateway validates everything, writes the message to the database as *queued*, and **subscribes to the reply channel before it publishes the command**. This ordering matters more than it looks: the worker can answer in milliseconds, and if the gateway published first and subscribed second, it could miss its own reply.
3. The worker receives the command, sends it through Baileys, and publishes back a small acknowledgement: sent, with the provider's message id, or failed, with a coded reason.
4. The gateway's open subscription resolves, and it returns a real status to the caller.

The ordering in step two is the whole game, and it is a bug you tend to meet exactly once. Redis pub/sub has no memory: a message published to a channel with no live subscriber is not queued for later, it is simply gone. The worker often acknowledges within a few milliseconds of receiving the command, which is faster than the round trip it takes the gateway's own subscribe call to register on the server. So the naive order, publish the command and then start listening for the reply, loses the race more often than you would believe. The acknowledgement fires into a channel nobody is listening on yet, vanishes, and the caller waits out the entire timeout for a message that in fact sent perfectly. The fix is to subscribe first, and to wait for the subscription to be confirmed ready, before publishing a single byte of the command. Only once the ear is definitely open do we speak.

If the acknowledgement does not arrive in time, the request does not hang or lie. It returns *queued* and lets the caller decide whether to poll or fall back. The honest answer, including "I do not know yet", is always better than a hopeful guess.

There is a subtle case worth mentioning. If an identical confirmed request arrives while the first is still in flight, a unique key in the database catches the duplicate before it can be published a second time; the second request finds the original and waits on the same acknowledgement rather than sending again. Idempotency has to be designed in from the first line; it is not something you bolt on once the bugs show up.

## One contract, obeyed by both sides

Split systems drift. The two halves quietly start disagreeing about what a valid message looks like, and you get bugs that only appear in the seam between them.

The defence is a single shared package that defines every contract: the database shapes, the wire protocol between gateway and worker, the public API, and the full list of error codes. Both halves import the same definitions and validate against the same schemas, so a message that is valid on one side is valid on the other by construction. When the protocol changes, the schema and both consumers change in the same commit. There is one source of truth, and it is enforced in code, not in a document nobody reads.

## Treating everything as hostile

Security was not bolted on afterwards; it shaped the structure.

Every input is validated at every trust boundary before it is trusted, whether it arrives over HTTP or off the queue. Messages between the gateway and the worker are signed, and each carries a one-time nonce and a timestamp, so a captured message cannot be replayed later. API access is granted through scoped keys (a key that may send notifications cannot necessarily send anything else), and traffic is rate-limited at several levels so no single caller can swamp the system. Sensitive actions are written to an append-only, hash-chained audit log, so the record of what happened cannot be quietly edited. Every webhook destination we deliver to is checked before we connect, so it cannot be pointed at an internal address: private, loopback, and link-local ranges are rejected, and the connection is pinned to the address we already resolved. Logs are structured, and the gateway redacts sensitive fields before anything is written, so credentials and message contents are kept out of them. The working assumption throughout is that anything from outside is hostile until proven otherwise.

There is one deliberate exception to the signing, and the reasoning behind it matters more than the rule itself. The acknowledgements the worker fires back to confirm a send are not signed. At first glance that looks like a hole. It is not. The channel each one travels on is named after a freshly minted, server-side identifier that an attacker cannot guess, the gateway only ever listens on channels it created moments earlier, and the worst a forged acknowledgement could achieve is to flip the reported status of a single message already in flight, never to cause a send. Signing it would add a cryptographic round trip to the hottest path in the system, to defend against an attack the channel design already rules out. Knowing where *not* to spend a security primitive is as much the job as knowing where to.

The split earns its keep here too. The half exposed to the internet, the gateway, holds no WhatsApp session and cannot send anything by itself; it can only ask the worker to. The half that holds the crown jewels, the live session that is effectively the account itself, is the half nothing outside can reach. Force the front door and you find a machine that can queue requests but holds no credentials and no connection. The valuable thing and the reachable thing are, by design, never the same thing.

## What you gain, and what it costs

Running your own gateway gives you control, privacy, and the ability to make it as reliable as you are willing to engineer it. None of that is free. You own the connection, the uptime, the security, and the maintenance, and there is no support line to call when something breaks at midnight. For most teams, paying a provider is the right call.

For us it was worth it, not because the idea is rare, but because the execution is where the real work lives: the split that lets each half do its job, the worker that heals itself, the send that waits for a real answer, and the single contract that keeps both sides honest. It took roughly 125 commits between April and June 2026 to get there, and almost none of that work shows in the description above. That is rather the point. Build the boring parts properly and the clever parts take care of themselves.
