03 · 07
In coming up with our horizontal sharding technique i did a large amount of research reading through a ton of the best practices out there.
Ultimately the base goals are
- Provide a mechanism for creating unique ids across all shards (so when you rebalance you don't have to rekey all your data)
- Provide a lookup table mapping rows to shards for fast lookups on at least one id.
There are bonus goals like:
- Getting time sorted results
- Keys that fit into a 64 bit space (for redis and index sizes)
The properies that make a solution great in my mind is simplicy of implementation, maintence, and resilence to errors. Performance always takes a back seat to our ability to work reliably and understand what we are working on.
Ways to generate Ids: (Heavily helped by instagram blog post below.)
Approach 1: Ticket Server (like flickr)
Creating a dedicated service that on request generates ids. Flickr uses odd-even methodology to provide high availablity but the idea can scale out to any number of ticket servers with initial coordination. For example 1+ 3n, 2+3n, 3+3n where n is each request would generate 3 sources of unique autoincremented ids.
Pros:
Cons:
- External service: potental for latency, maintenance or bottlenecks
- Ids are not strictly time sortable
Approach 2: Timestamp-shardId-modulo(autoincrement) aka instagram sharding
See the instagram blog link above.
Pros:
- No external service required (no latency, maintenance or bottlenecks)
Cons:
- Reliance on making sure timekeeping is accurate between shards
- Has known limitations (a death clock, number of writes per second/shard) All can be made reasonably high though.
Approach 3: Guids
Automatically generated guaranteed unique ids.
Pros:
- No external service required (no latency, maintenance or bottlenecks)
- Extremely easy to generate and understand
Cons:
Approach 4: Preallocation registry and allocation
Ahead of time allocate a set of id-chunks in batches of 10K. Take each batch and assign them to a given shard. Shard then manages its chunk with its own internal counter (autoincrement) + batch start point. As batches are used up new batches are allocated on demand and shards continue generating ids but with a new offset.
Pros:
- Solves both id allocation and row mapping
- Lends itself to mysql database-per-shard best practices
Cons:
- Requires a allocation service (potential for latency, maintenance and bottlenecks)
- Slightly complex when orchestrating block allocation and id counters.
Approach 4.1 Preallocation registry and fixed size shards
Again you generate a set of id-chunks in batches of 100K or so. Take each batch and assign it to a shard. Shard starts its autoincrementing at that offset but simply registers back as being full when it exhausts its allocation. Application layer would need to get the next shard for new writes.
Pros:
- Extremely simple to implement on the db.
- Lends itself to best practices with mysql database-per-shard techniques.
- Solves the row to shard mapping at the sametime.
- Doesn't require a external service as the allocation can be coded in a lookup table or configuration file on deploy.
- Get the 64 bit ids while your id ranges stay below 2^64 unsigned.
Cons:
- Just a bit more complex at the application level with insert error checking and fail over into a new shard. (can be avoided for a long time with uniform distribution of writes across all shards and a large set of shards).
- Don't get a natural sort by time created across all shards.
Knowledge needed to make this work:
You will need to set two autoincrementing null false fields (one as the primary id key) and another as the table size limit counter. You will need to offset the ids appropriately. For the primary key id you will want to offset to the starting block for the db. For the table size limit you will want to offset to 100K less than the max for a medium int (its different for signed or unsigned). If the CHECK constraint were implemented we would use that instead.
Error produced on autoincrement overflow
How to set a starting point for a autoincremented id
Ways to map rows to shards
Pretty much by necessity this must be a table or hash of somesort.
The approaches pretty much boil down to either a remote service for the lookup or maintaining a small enough and constant enough table to include as part of the application layer. Depending on how you picked ids above your choice will be picked for you.