Building Discord Bots at Scale: Chat Guard Architecture

When Chat Guard hit 1,000 servers, we thought that was a milestone. When it hit 7,500 servers with over 15 million users across all of them, the team realised we were just getting started, and that the original architecture needed a complete rethink.

The Naive Architecture

The first version of Chat Guard was a single Node.js process. One index.js file, one database connection, one event loop handling everything. It worked brilliantly for 50 servers. At 500, it started showing cracks. At 1,000, it was held together by optimism and setInterval.

The early architecture looked like this:

Single Node.js Process
├── Discord.js Client (one shard)
├── MongoDB Connection (single)
├── Command Handler
├── Event Listeners (message, join, leave)
└── Moderation Logic

The problems were predictable: memory leaks from uncollected event listeners, database connection timeouts under load, and the occasional complete crash when Discord's gateway hiccuped.

Sharding: The First Real Architecture Decision

Discord forces sharding once your bot exceeds 2,500 servers. But the smart move is to shard earlier. Each shard maintains its own WebSocket connection to Discord's gateway, handling a subset of your total servers.

We implemented sharding at around 800 servers, before Discord required it, and it was one of the best architectural decisions made early on. Each shard runs as an independent process, isolated from the others. If one crashes, the rest keep running.

Shard Manager
├── Shard 0 (servers 0-999)
├── Shard 1 (servers 1000-1999)
├── Shard 2 (servers 2000-2999)
└── ...

The Database Layer

MongoDB was the right choice for this use case, and I will defend that position. Discord bot data is inherently document-shaped: each server has its own configuration, its own log entries, its own moderation history. The schema varies per server because admins customise their setups differently.

But MongoDB at scale requires discipline:

Index everything you query. This sounds obvious until you are debugging a query that takes 8 seconds because you forgot to index a compound field.
Use connection pooling aggressively. Each shard needs its own pool, not a shared connection.
Batch writes. Do not write to the database on every message event. Buffer, batch, flush. I learned this the hard way when our write throughput hit MongoDB's limits.

I wrote more about the database patterns specifically in MongoDB Patterns I Learned from 7,500 Discord Servers.

Rate Limiting: Respecting Discord's API

Discord's rate limits are strict, and for good reason. Every API call (banning a user, sending a message, modifying a channel) has a rate limit. Exceed it, and you are blocked. Exceed it repeatedly, and your bot gets flagged.

The team built a queue system using an in-memory priority queue with exponential backoff. Every API call goes through the queue. The queue respects per-route rate limits, per-guild rate limits, and global rate limits. It sounds like overengineering until you realise the alternative is your bot randomly failing to moderate because it hit a rate limit mid-operation.

Event Processing Pipeline

At 7,500+ servers, Chat Guard processes thousands of events per second. Every message, every join, every leave, every reaction: they all flow through the event pipeline.

The pipeline is straightforward:

Receive event from Discord gateway
Filter: is this server using this feature?
Validate: does the bot have the required permissions?
Process: execute the moderation logic
Log: write to the audit log (batched)

The critical insight was step 2. Most events can be discarded immediately because the server does not have the relevant feature enabled. This early-exit pattern reduced processing load by roughly 70%.

What I Would Do Differently

If the team were building Chat Guard from scratch today:

TypeScript from day one. The refactoring pain of adding types to a large JavaScript codebase is real.
Redis for caching. We relied too heavily on in-memory caches that reset on restart.
Better observability. We added structured logging too late. For the first year, debugging production issues meant reading through console.log statements.

The Numbers

7,500+ Discord servers
15M+ end users reached
99.7% uptime over the last year
<100ms average command response time

These numbers did not happen by accident. They happened because every scaling problem forced a better architectural decision.

The Naive Architecture

The early architecture looked like this:

Single Node.js Process
├── Discord.js Client (one shard)
├── MongoDB Connection (single)
├── Command Handler
├── Event Listeners (message, join, leave)
└── Moderation Logic

The problems were predictable: memory leaks from uncollected event listeners, database connection timeouts under load, and the occasional complete crash when Discord's gateway hiccuped.

Sharding: The First Real Architecture Decision

Shard Manager
├── Shard 0 (servers 0-999)
├── Shard 1 (servers 1000-1999)
├── Shard 2 (servers 2000-2999)
└── ...

The Database Layer

But MongoDB at scale requires discipline:

Index everything you query. This sounds obvious until you are debugging a query that takes 8 seconds because you forgot to index a compound field.
Use connection pooling aggressively. Each shard needs its own pool, not a shared connection.
Batch writes. Do not write to the database on every message event. Buffer, batch, flush. I learned this the hard way when our write throughput hit MongoDB's limits.

I wrote more about the database patterns specifically in MongoDB Patterns I Learned from 7,500 Discord Servers.

Rate Limiting: Respecting Discord's API

Event Processing Pipeline

At 7,500+ servers, Chat Guard processes thousands of events per second. Every message, every join, every leave, every reaction: they all flow through the event pipeline.

The pipeline is straightforward:

Receive event from Discord gateway
Filter: is this server using this feature?
Validate: does the bot have the required permissions?
Process: execute the moderation logic
Log: write to the audit log (batched)

What I Would Do Differently

If the team were building Chat Guard from scratch today:

TypeScript from day one. The refactoring pain of adding types to a large JavaScript codebase is real.
Redis for caching. We relied too heavily on in-memory caches that reset on restart.
Better observability. We added structured logging too late. For the first year, debugging production issues meant reading through console.log statements.

The Numbers

7,500+ Discord servers
15M+ end users reached
99.7% uptime over the last year
<100ms average command response time

These numbers did not happen by accident. They happened because every scaling problem forced a better architectural decision.

Building Discord Bots at Scale: Chat Guard Architecture

The Naive Architecture

Sharding: The First Real Architecture Decision

The Database Layer

Rate Limiting: Respecting Discord's API

Event Processing Pipeline

What I Would Do Differently

The Numbers

Blog

Say Hello

Building Discord Bots at Scale: Chat Guard Architecture

The Naive Architecture

Sharding: The First Real Architecture Decision

The Database Layer

Rate Limiting: Respecting Discord's API

Event Processing Pipeline

What I Would Do Differently

The Numbers

The Naive Architecture

Sharding: The First Real Architecture Decision

The Database Layer

Rate Limiting: Respecting Discord's API

Event Processing Pipeline

What I Would Do Differently

The Numbers

Stay in the loop

Related Posts

MongoDB Patterns I Learned from 7,500 Discord Servers

Real-Time Invite Tracking and Fake Account Detection

Building a Zero-Fee Payment API: Technical Decisions

Blog

Say Hello

The Naive Architecture

Sharding: The First Real Architecture Decision

The Database Layer

Rate Limiting: Respecting Discord's API

Event Processing Pipeline

What I Would Do Differently

The Numbers

Stay in the loop

Related Posts

MongoDB Patterns I Learned from 7,500 Discord Servers

Real-Time Invite Tracking and Fake Account Detection

Building a Zero-Fee Payment API: Technical Decisions