---
title: "MongoDB Patterns I Learned from 7,500 Discord Servers"
description: "Practical MongoDB patterns for high-throughput apps: per-guild documents, capped collections, compound indexes, bulk writes and lazy schema migration at scale."
date: 2024-11-10
category: Engineering
readingTime: "4 min read"
---


When your database serves 7,500 Discord servers with millions of records, you learn MongoDB patterns that no tutorial covers. These are the patterns we wish someone had told us about before deploying to production.

## Pattern 1: The Per-Guild Document

The most natural schema for Discord bots is one document per server (guild). Each document contains that server's configuration, moderation settings, and metadata.

```json
{
  "_id": "guild_123456789",
  "prefix": "!",
  "features": {
    "automod": true,
    "logging": true,
    "welcomeMessage": false
  },
  "moderators": ["user_111", "user_222"],
  "updatedAt": "2023-01-15T10:30:00Z"
}
```

This pattern works because Discord data is naturally partitioned by guild. You almost never query across guilds. Each operation is scoped to one server.

**Key insight:** Use the guild ID as `_id`. Every collection gets a unique `_id` index whether you want one or not, so you may as well make it earn its keep. Do not waste it on an autogenerated ObjectId when you have a perfect natural key. The pay-off is that every config lookup becomes a primary-key hit with no secondary index to maintain, and inserts cannot create duplicate guild rows.

## Pattern 2: Capped Collections for Logs

Moderation logs grow fast. At scale, a single server might generate thousands of log entries per month. Storing all of them indefinitely is expensive and unnecessary. Most admins only look at the last 30 days.

MongoDB's capped collections are purpose-built for this:

```javascript
db.createCollection("modlogs", {
  capped: true,
  size: 104857600, // 100MB
  max: 500000      // 500k documents
})
```

Capped collections automatically evict the oldest documents when the size or document limit is reached. No TTL indexes, no cleanup jobs, no manual pruning. Insertion order is preserved on disk, so a natural-order scan returns logs newest-last for free. The trade-offs are real: documents cannot be deleted individually, an update must not grow a document beyond its original size, and the collection cannot be sharded. For append-only logs none of these bites, but reach for a TTL index instead the moment you need per-document expiry or selective deletion.

## Pattern 3: Compound Indexes That Match Your Queries

This is where most people get burned. MongoDB can only use one index per query (with some exceptions for `$or`). If you are querying by guild ID and sorting by date, you need a compound index.

```javascript
// Bad: Two separate indexes
db.modlogs.createIndex({ guildId: 1 })
db.modlogs.createIndex({ createdAt: -1 })

// Good: One compound index that serves both
db.modlogs.createIndex({ guildId: 1, createdAt: -1 })
```

The compound index handles `find({ guildId: "xxx" }).sort({ createdAt: -1 })` in a single index scan. The two separate indexes would require MongoDB to choose one, then sort in memory, which is orders of magnitude slower, and an in-memory sort over more than 32MB aborts the query outright.

**Rule of thumb (ESR):** order index fields Equality, then Sort, then Range. Equality filters first, the sort key next, range filters last. Matching that order lets a single index satisfy both the filter and the sort, and you can confirm it with `.explain("executionStats")`: look for an `IXSCAN` feeding the results directly, with no `SORT` stage in the plan.

## Pattern 4: Bulk Write Operations

When Chat Guard processes events, it does not write to MongoDB on every event. Instead, it buffers writes and flushes them in bulk.

```javascript
const writeBuffer = [];
const FLUSH_INTERVAL = 5000; // 5 seconds
const MAX_BUFFER_SIZE = 100;

function bufferWrite(operation) {
  writeBuffer.push(operation);
  if (writeBuffer.length >= MAX_BUFFER_SIZE) {
    flush();
  }
}

async function flush() {
  if (writeBuffer.length === 0) return;
  const operations = writeBuffer.splice(0);
  await collection.bulkWrite(operations, { ordered: false });
}

setInterval(flush, FLUSH_INTERVAL);
```

The `{ ordered: false }` flag tells MongoDB it can execute operations in any order, which allows parallelism. It also changes failure behaviour: an ordered bulk write stops at the first error, whereas an unordered one attempts every operation and reports the failures at the end. For logging and analytics, where one malformed document should not block the other ninety-nine, that is exactly what you want, and the throughput improvement is significant.

Two things worth flushing on, beyond the size and interval triggers above: flush on a timer so a quiet buffer still drains, and flush on shutdown so you do not lose the tail of the buffer when the process exits.

## Pattern 5: Projection, Only Fetch What You Need

I see this mistake constantly: fetching entire documents when you only need two fields. With guild documents that contain nested configuration objects, this matters.

```javascript
// Bad: Fetches the entire document
const guild = await db.guilds.findOne({ _id: guildId });
const prefix = guild.prefix;

// Good: Only fetches the prefix field
const guild = await db.guilds.findOne(
  { _id: guildId },
  { projection: { prefix: 1 } }
);
```

At scale, this reduces network transfer, memory usage, and deserialisation time. It adds up fast when you are processing thousands of events per second.

## Pattern 6: Schema Versioning

Over two years, the guild configuration schema changed at least 15 times. New features added, old ones deprecated, field names changed. Without schema versioning, migrations become nightmares.

```json
{
  "_id": "guild_123456789",
  "_schemaVersion": 3,
  "config": { ... }
}
```

On read, check the version. If it is outdated, run a lazy migration: update the document to the current schema and save it. Documents migrate only when they are actually touched, so you never run an expensive batch migration over millions of rows, most of which may never be read again. The cost is that your read path must understand every historical version, so keep a migration function per version step and chain them; once almost everything has been read at least once, a final background sweep can retire the old code path.

## What Did Not Work

- **Embedded arrays for large datasets.** MongoDB documents have a 16MB limit. An embedded array of moderation logs hits that limit faster than you would expect.
- **GridFS for small files.** The overhead is not worth it for files under 1MB. Use base64 encoding or an external store.
- **Replica sets without read preference configuration.** Default read preference is `primary`, which defeats the purpose of having replicas for read scaling.

## The Bottom Line

MongoDB at scale is not harder than SQL at scale. It is differently hard. The patterns that save you are the ones that respect MongoDB's strengths: document-shaped data, index-driven queries, and bulk operations.

The patterns that hurt you are the ones that fight against it: relational joins, unindexed queries, and single-document writes in tight loops.