I love simple, lightweight, small, minimal tools that just do the bare minimum. Based on that, does anyone have any good recommendations for a key value store that is:
- sharded (so if I have 5 instances and 100 keys, each node will roughly have 20 keys on it).
- easy to join nodes: as in
kv-server --join somehost:1111
For reference, I think Consul is too heavy (and not sharded I believe).
It would be great to have a small go executable, that I can run on 10 servers, all connected up, that exposes a redis like api. Simple GET, PUT and STREAM would be great.
@prologic anyone else here I can ping?
@prologic ever done any stress testing on bitraft? In a cluster, do you know that the throughput would be? Like, PUT’s per second and GET’s per second?
@markwylde No but I could do some testing and publish the results 👌
As for the sharding though… Let’s discuss this?
@prologic I’m happy to do it. Might try now actually. It was just incase you knew. I’ll post in the README if I get it working. I’m hoping redis-benchmark will work since it’s got the same api as redis.
I wonder if sharding could be implemented by:
- redis can broadcast to all nodes in the cluster
- REPLICA_COUNT is 3
- a PUT get’s forwarded to REPLICA_COUNT random nodes in the cluster
- a broadcast is made to the cluster saying “I NEED A VALUE FOR KEY ‘TEST’”
- all nodes that contain that value reply to the server
- the first response get’s forwarded to the client
- the other responses are discarded
I’m sure there would be some edges cases, like syncing.
- What if 1 of the random node’s is full and therefore only REPLICA_COUNT-1 nodes received the document
- This could me 2 nodes have the new value, but the 3rd has the old value
Maybe it could be solved by only committing once REPLICA_COUNT nodes successfully receive the message.
@markwylde If you could benchmark this that would be wonderful! 👌 – Also reading your thought son “Sharding”, I think you might be slightly confused, because what you just described is essentially “High Availability”, and not Sharding.
In fact Bitraft already has this anyway. It fully supports forming a High Availability Cluster.
But in Bitraft every node contains every key + value, right? I probably wasn’t clear above, but in my idea REPLICA_COUNT would be 3 but the NODE_COUNT may be 10. So a put would go to 3 of 10 of the nodes.
Did a quick benchmark:
Seems the summary benchmark of a 5node cluster on my laptop is:
GET: 1165.64 requests per second SET: 1061.80 requests per second