Skip to main content

Command Palette

Search for a command to run...

MongoDB ObjectID: anatomy of a distributed identifier

Published
9 min read
MongoDB ObjectID: anatomy of a distributed identifier

MongoDB's default ObjectID exists because distributed databases cannot afford the coordination cost of sequential integers. Rather than relying on a single authority to hand out the next number — a bottleneck that throttles write throughput across every shard and replica — MongoDB drivers generate 12-byte ObjectIDs entirely client-side, producing globally unique identifiers without any cross-node communication. This design choice is foundational to MongoDB's horizontal scalability. The ObjectID encodes a timestamp, a per-process random value, and an incrementing counter into a compact 96-bit value that is roughly time-ordered, storage-efficient, and collision-resistant across billions of documents.

Why auto-incrementing integers fail at scale

Sequential integer IDs require a centralized counter — a single document or service that every node must consult before inserting. In a distributed MongoDB cluster, this creates three compounding problems. First, every insert needs a network round-trip to the counter authority, adding latency proportional to geographic distance. Second, that counter becomes a single point of failure: if it goes down, no node can insert anything. Third, concurrent access to the counter creates race conditions. MongoDB's own engineering blog illustrates this: two threads read the current value (41), both compute 42, and both attempt to store documents with the same ID — causing a duplicate key error or a lost write. Solving this with atomic findAndModify operations serializes all inserts through a single bottleneck, destroying throughput.

ObjectID sidesteps all of this. The MongoDB driver generates the ID on the application server before the insert ever reaches the database. No coordination, no contention, no central authority. Each client independently produces unique IDs, and insertion throughput scales linearly with the number of application servers.

The 12-byte anatomy of an ObjectID

An ObjectID is 12 bytes rendered as a 24-character hexadecimal string — for example, 507f1f77bcf86cd799439011. Since a 2018 specification revision (formalized for MongoDB 3.4+), those 12 bytes break down into three segments:

Bytes 0–3: Unix timestamp (4 bytes). A 32-bit unsigned integer representing seconds since January 1, 1970 UTC, stored in big-endian byte order. This gives one-second resolution and extends to approximately the year 2106, avoiding the signed 32-bit overflow problem of 2038. Big-endian encoding is deliberate — it allows raw memcmp byte comparison to produce correct chronological ordering, which is critical for index efficiency.

Bytes 4–8: Per-process random value (5 bytes). Generated once when the driver process starts and held constant for the lifetime of that process. This value does not need to be cryptographically secure; a standard PRNG seeded with OS entropy suffices. The 5-byte width provides ~1.1 trillion possible values, making cross-process collisions statistically negligible.

Bytes 9–11: Incrementing counter (3 bytes). Initialized to a random value at driver startup and incremented by one for each ObjectID generated. Stored in big-endian order. The 3-byte counter supports 16,777,216 unique ObjectIDs per process per second before wrapping to zero.

The pre-3.4 format split bytes 4–8 differently: 3 bytes for an MD5 hash of the machine hostname plus 2 bytes for the process ID. MongoDB retired this layout because MD5 cannot be used on FIPS-compliant systems, virtual machines cloned from identical images produced identical machine hashes, and different language drivers implemented these fields inconsistently. Merging machine ID and process ID into a single 5-byte random value solved all three problems while maintaining uniqueness guarantees.

Why ObjectID excels in distributed architectures

The three-segment design creates a layered uniqueness guarantee without any distributed consensus. Documents created in different seconds differ in the timestamp. Documents from different processes in the same second differ in the random value. Documents from the same process in the same second differ in the counter. The only theoretical collision requires a single process to generate more than 16.7 million ObjectIDs within one second — an extreme edge case.

Beyond uniqueness, ObjectID provides roughly monotonic ordering because the leading bytes are a timestamp. Sorting by _id approximates sorting by creation time, and range queries on _id can efficiently select documents from a time window. The embedded timestamp also eliminates the need for a separate createdAt field in many applications — any ObjectID's creation time is extractable via ObjectId.getTimestamp() with one-second precision.

At 12 bytes, ObjectID is more compact than a UUID's 16 bytes (33% smaller) or a string UUID's 36 bytes. Since every MongoDB document carries an _id field with a mandatory unique index, this size difference compounds across millions of documents — affecting index memory footprint, disk usage, and WiredTiger cache efficiency.

Two caveats apply. ObjectIDs created within the same second from different machines have no guaranteed ordering, and clients with skewed system clocks can produce out-of-order timestamps.

Performance costs of random IDs versus monotonic ones

The performance gap between ObjectID and random identifiers like UUIDv4 stems from B-tree index mechanics in WiredTiger, MongoDB's default storage engine. ObjectID's roughly monotonic nature means new entries append to the right edge of the B-tree, keeping the hot working set confined to a few rightmost leaf pages. Only these pages need to reside in the WiredTiger cache for write-heavy workloads.

Random UUIDs scatter inserts across the entire B-tree. Each random insert may land on a page not currently in cache, forcing a disk read. When a leaf page fills, it splits — and random UUIDs cause 500× more page splits than sequential IDs (roughly 5,000–10,000 splits per million records versus 10–20). Each split triggers cascading rebalancing up the tree, producing 5–10× write amplification per logical insert.

Benchmarks confirm the theory. A Go benchmark against MongoDB 6.0.5 with 1 million batched inserts (batch size 10,000) measured ObjectID at 3.32 seconds, ULID at 3.52 seconds, and UUID at 5.16 seconds — 55% slower. The gap widens dramatically at scale: inserting 10 million documents into a collection already containing 10 million took ObjectID about 32 seconds versus over 63 seconds for UUID. Separate benchmarks report UUIDs sustaining roughly 2,000 inserts per second at 10–20 million documents where ObjectIDs sustain 7,500 — a 3.75× throughput difference.

Index size also matters. ObjectID occupies 12 bytes per key; a binary UUID takes 16 bytes (33% more); a string UUID takes 36 bytes (3× more). With MongoDB's typical 4 KB index pages, ObjectID fits approximately 317 entries per page versus fewer for larger key types, directly affecting how much of the index the cache can hold.

Custom IDs are fully supported and sometimes preferable

MongoDB's _id field accepts any BSON data type except arrays, regex, or undefined — strings, integers, UUIDs, dates, embedded documents, and more. To use a custom ID, simply include _id in the document at insert time. If omitted, the driver auto-generates an ObjectID.

The strongest case for custom IDs is natural keys that will never change. Using an email address, username, or external system identifier as _id eliminates the need for a separate unique index on that field, saving both storage and write overhead. MongoDB's documentation explicitly recommends this: "Use a natural unique identifier, if available. This saves space and avoids additional indexes."

Compound _id values — embedded documents like { region: "US", order: 12345 } — work well for data with natural composite keys. In application code, the implementation is straightforward across drivers:

// Node.js
await collection.insertOne({ _id: "user@example.com", name: "Alice" });

// Mongoose schema with custom _id
const schema = new mongoose.Schema({
  _id: { type: String, required: true },
  name: String
});
# PyMongo
collection.insert_one({ '_id': 'user@example.com', 'name': 'Alice' })

MongoDB's Node.js driver also supports a pkFactory option that auto-generates custom IDs when none is provided, enabling application-wide ID strategies without modifying every insert call.

The decision framework rests on five dimensions, each favoring different ID strategies.

Write performance and scalability strongly favor monotonic IDs. ObjectID and time-ordered alternatives (UUIDv7, ULID, KSUID) maintain right-edge B-tree appends. Random UUIDs (v4) degrade catastrophically beyond 10 million documents. For write-heavy workloads at scale, this is often the deciding factor.

Privacy and security favor random UUIDs. ObjectID's first four bytes encode the creation timestamp in a trivially decodable format. Exposing ObjectIDs in URLs or APIs reveals when every document was created, enabling enumeration attacks and leaking usage patterns. UUIDv4 reveals nothing. UUIDv7 encodes time but in a less immediately obvious format.

Portability across databases favors UUIDs. ObjectID is MongoDB-specific. Applications that might migrate to PostgreSQL, MySQL, or another database benefit from the universal UUID standard. MongoDB's own blog acknowledges this: "Some businesses may be reluctant to link their application logic to an identifier generated by a specific database product."

Storage efficiency favors ObjectID (12 bytes) over binary UUID (16 bytes) and dramatically over string UUID (36 bytes). For collections with hundreds of millions of documents, the 33% index size reduction versus binary UUID translates to meaningful memory and disk savings.

Sharding behavior requires special attention regardless of ID type. Monotonically increasing IDs (including ObjectID) used directly as a shard key route all inserts to a single chunk, creating hotspots. The solution is hashed sharding ({ _id: "hashed" }) or compound shard keys. Random UUIDs distribute writes evenly but sacrifice range query efficiency.

Conclusion

ObjectID is MongoDB's default for a reason: it delivers globally unique, roughly time-ordered, compact identifiers without any distributed coordination — the exact properties a horizontally scalable database needs. Its 12-byte structure packs a timestamp, process-unique random value, and counter into a format optimized for B-tree index performance and memcmp sorting.

Custom IDs make sense in specific scenarios — natural keys that eliminate redundant indexes, UUIDs required for cross-database portability, or random IDs needed to prevent timestamp leakage. The critical insight is that ID ordering matters more than ID type: any time-ordered identifier (ObjectID, UUIDv7, ULID) will dramatically outperform random ones at scale due to B-tree mechanics. Applications choosing custom IDs should prioritize monotonic or semi-monotonic strategies, store UUIDs as binary rather than strings, and test write performance at target collection sizes before committing to an ID scheme.