---
title: "MongoDB Patterns I Learned from 7,500 Discord Servers"
description: "Practical MongoDB patterns for high-throughput applications - from schema design to indexing strategies, learned the hard way running Discord bots at scale."
date: 2024-11-10T00:00:00.000Z
category: Engineering
readingTime: "3 min read"
---


When your database serves 7,500 Discord servers with millions of records, you learn MongoDB patterns that no tutorial covers. These are the patterns we wish someone had told us about before deploying to production.

## Pattern 1: The Per-Guild Document

The most natural schema for Discord bots is one document per server (guild). Each document contains that server's configuration, moderation settings, and metadata.

```json
{
  "_id": "guild_123456789",
  "prefix": "!",
  "features": {
    "automod": true,
    "logging": true,
    "welcomeMessage": false
  },
  "moderators": ["user_111", "user_222"],
  "updatedAt": "2023-01-15T10:30:00Z"
}
```

This pattern works because Discord data is naturally partitioned by guild. You almost never query across guilds - each operation is scoped to one server.

**Key insight:** Use the guild ID as `_id`. MongoDB's `_id` index is free. Don't waste it on an autogenerated ObjectId when you have a perfect natural key.

## Pattern 2: Capped Collections for Logs

Moderation logs grow fast. At scale, a single server might generate thousands of log entries per month. Storing all of them indefinitely is expensive and unnecessary - most admins only look at the last 30 days.

MongoDB's capped collections are purpose-built for this:

```javascript
db.createCollection("modlogs", {
  capped: true,
  size: 104857600, // 100MB
  max: 500000      // 500k documents
})
```

Capped collections automatically evict the oldest documents when the size or document limit is reached. No TTL indexes, no cleanup jobs, no manual pruning. The trade-off is you can't delete individual documents - but for append-only logs, that's rarely needed.

## Pattern 3: Compound Indexes That Match Your Queries

This is where most people get burned. MongoDB can only use one index per query (with some exceptions for `$or`). If you're querying by guild ID and sorting by date, you need a compound index.

```javascript
// Bad: Two separate indexes
db.modlogs.createIndex({ guildId: 1 })
db.modlogs.createIndex({ createdAt: -1 })

// Good: One compound index that serves both
db.modlogs.createIndex({ guildId: 1, createdAt: -1 })
```

The compound index handles `find({ guildId: "xxx" }).sort({ createdAt: -1 })` in a single index scan. The two separate indexes would require MongoDB to choose one, then sort in memory - which is orders of magnitude slower.

**Rule of thumb:** Your index field order should match your query pattern - equality filters first, sort fields last.

## Pattern 4: Bulk Write Operations

When Chat Guard processes events, it doesn't write to MongoDB on every event. Instead, it buffers writes and flushes them in bulk.

```javascript
const writeBuffer = [];
const FLUSH_INTERVAL = 5000; // 5 seconds
const MAX_BUFFER_SIZE = 100;

function bufferWrite(operation) {
  writeBuffer.push(operation);
  if (writeBuffer.length >= MAX_BUFFER_SIZE) {
    flush();
  }
}

async function flush() {
  if (writeBuffer.length === 0) return;
  const operations = writeBuffer.splice(0);
  await collection.bulkWrite(operations, { ordered: false });
}

setInterval(flush, FLUSH_INTERVAL);
```

The `{ ordered: false }` flag tells MongoDB it can execute operations in any order, which allows parallelism. For logging and analytics, order usually doesn't matter - and the throughput improvement is significant.

## Pattern 5: Projection - Only Fetch What You Need

I see this mistake constantly: fetching entire documents when you only need two fields. With guild documents that contain nested configuration objects, this matters.

```javascript
// Bad: Fetches the entire document
const guild = await db.guilds.findOne({ _id: guildId });
const prefix = guild.prefix;

// Good: Only fetches the prefix field
const guild = await db.guilds.findOne(
  { _id: guildId },
  { projection: { prefix: 1 } }
);
```

At scale, this reduces network transfer, memory usage, and deserialization time. It adds up fast when you're processing thousands of events per second.

## Pattern 6: Schema Versioning

Over two years, the guild configuration schema changed at least 15 times. New features added, old ones deprecated, field names changed. Without schema versioning, migrations become nightmares.

```json
{
  "_id": "guild_123456789",
  "_schemaVersion": 3,
  "config": { ... }
}
```

On read, check the version. If it's outdated, run a lazy migration - update the document to the current schema and save it. This avoids expensive batch migrations and handles upgrades gracefully.

## What Didn't Work

- **Embedded arrays for large datasets.** MongoDB documents have a 16MB limit. An embedded array of moderation logs hits that limit faster than you'd expect.
- **GridFS for small files.** The overhead isn't worth it for files under 1MB. Just use base64 encoding or an external store.
- **Replica sets without read preference configuration.** Default read preference is `primary`, which defeats the purpose of having replicas for read scaling.

## The Bottom Line

MongoDB at scale isn't harder than SQL at scale - it's differently hard. The patterns that save you are the ones that respect MongoDB's strengths: document-shaped data, index-driven queries, and bulk operations.

The patterns that hurt you are the ones that fight against it: relational joins, unindexed queries, and single-document writes in tight loops.

---

*Running into MongoDB performance issues? Start with `explain()` on your slowest queries. The answer is almost always a missing index.*
