Discord servers live and die by their communities. Server admins need to know where their members are coming from - which invite links are working, which promoters are bringing real users, and which are flooding the server with fake accounts. We built a system to answer all three questions in real time.
The Invite Tracking Problem
Discord provides basic invite tracking through its API, but the data is surprisingly limited. You can see invite codes and their use counts, but you can't directly see which specific user used which invite. The API only gives you a snapshot of invite counts - not a per-join attribution.
The solution is differential tracking: capture the invite counts before and after each join event, then calculate which invite code's count increased.
// On member join:
// 1. Fetch current invite counts
// 2. Compare with cached counts from before the join
// 3. The invite whose count increased by 1 = the invite used
// 4. Update cacheThis sounds simple, but at scale it gets tricky. If two members join within milliseconds of each other, the differential tracking can produce race conditions. I solved this with a per-guild mutex that serializes join event processing.
Building the Attribution System
Each join produces an attribution record:
{
"guildId": "123456",
"userId": "789012",
"inviterId": "345678",
"inviteCode": "abc123",
"joinedAt": "2023-03-10T14:22:00Z",
"accountAge": "2 days",
"flags": ["new_account"]
}This record links the new member to the person who invited them. Over time, you build a complete map of who invited whom, which is exactly what community managers need when running referral programmes or identifying the highest-value community ambassadors.
Fake Account Detection
This is where it gets interesting. Fake accounts - created in bulk to inflate server numbers or spam - have telltale patterns:
Heuristic 1: Account Age
Accounts created within the last 7 days are flagged. Accounts created within the last 24 hours are escalated. Legitimate users occasionally have new accounts, but bulk-created fake accounts almost always have creation dates within hours of joining.
Heuristic 2: Join Velocity
If the same invite link is used by 10 accounts within 5 minutes, that's suspicious. Legitimate invite sharing produces a steady trickle of joins. Fake account floods produce bursts.
Heuristic 3: Behavioral Signals
After joining, fake accounts exhibit predictable behaviour:
- No activity. They join and go silent. Real users at least look around.
- Immediate DM spam. Fake accounts often send direct messages to members within minutes of joining.
- Default avatar. While legitimate users can have default avatars, the combination of a new account + default avatar + burst join pattern is a strong signal.
Scoring System
Each heuristic produces a risk score between 0 and 1. The scores are weighted and combined:
- Account age < 24h: 0.4
- Burst join pattern: 0.3
- Default avatar: 0.1
- No activity after 1 hour: 0.2
A combined score above 0.7 triggers automatic action (configurable per server - kick, ban, quarantine role, or just flag for review).
Real-Time Dashboard
Admins see their invite data in real time:
- Leaderboard: Which members have invited the most people
- Invite breakdown: Which links are performing best
- Risk alerts: Flagged accounts with their risk scores
- Retention data: What percentage of invited members are still active after 7 days
The retention metric was the most popular feature. Server admins could finally see not just who was joining, but who was staying. An inviter who brings 100 members who all leave within a day is less valuable than one who brings 10 members who become active community participants.
The Scale Challenge
Processing join events for thousands of servers means handling thousands of invite cache comparisons per minute. The invite cache alone - storing invite counts for every active invite across every server - consumed significant memory.
I moved from in-memory caching to Redis, keyed by guild ID. This allowed horizontal scaling (multiple bot shards sharing the same cache) and persistence across restarts. The trade-off was latency - Redis adds a network hop compared to in-memory access - but the reliability was worth it.
Lessons Learned
- Differential tracking is fragile. Race conditions are real. Protect shared state with locks.
- Heuristics beat ML for this scale. I considered training a classifier, but hand-tuned heuristics were faster to implement, easier to explain to admins, and more predictable in production.
- Admins want control. Every automated action should be configurable. What's spam in one server is normal in another.
As I described in Building Discord Bots at Scale, the event processing pipeline needs to be efficient - and invite tracking is one of the most processing-intensive features in the entire bot.
Running a Discord server? Track your invites. The data tells a story your member count alone never will.