Lab 4 Intro + Part 1 (ShardMaster) CS505 Spring 2021 Lab 3 Wrap up ● Lab 4 relies heavily on lab 3 ● However, even if your lab 3 implementation isn’t perfect, you can still work on lab 4 ○ But definitely work on finishing up lab 3 first! Especially any correctness bugs. ○ Some liveness issues may not trigger problems in lab 4. ○ ShardMaster (lab 4.1) doesn’t depend on Paxos ○ Some 4.2/4.3 tests use Paxos groups of size 1, so you can still pass those without a perfect Paxos implementation ○ You can also debug the rest of the 4.2/4.3 tests by changing them to use Paxos groups of size 1, to test the actual lab 4 logic separately from the Paxos logic Lab 4 Overview Goal: Build a “linearizable, sharded key-value store with multi-key updates and dynamic load balancing, similar in functionality to Amazon's DynamoDB or Google's Spanner”. Lab 4 Overview ● Paxos increases reliability, sharding increases performance and scalability ● Part 1 implements the ShardMaster application (handles load balancing) ● Part 2 implements a Sharded KV store and handles moving shards ● Part 3 adds multi-key transaction support ○ This means a single request can update multiple keys in different shards residing in different paxos groups, while maintaining linearizability ○ Implemented using two-phase commit Shards vs Groups Peer 0 Peer 1 Peer 2 Paxos Group 0 Peer 0 Peer 1 Peer 2 Paxos Group 1 Peer 0 Peer 1 Peer 2 Paxos Group 2 Shard 1 (a-d) Shard 2 (e-h) Shard 3 (i-l) Shard 4 (m-p) Shard 5 (q-t) Shard 6 (u-x) Shard 7 (y-z) Sharding: what is it? ● Divides keyspace (the K in K/V) into multiple groups, called shards ○ Can shard keys on many things (alphabetically, random/hashes, load-balanced etc.) ● Each shard will be handled by a group of servers. Each group: ○ Runs Paxos from lab 3. So we can assume a group will not fail :) ○ Stores all key/value pairs in the database that correspond to its shard ○ Accepts/responds to client requests that correspond to its shard ● Since different sharding groups can run in parallel without communicating, performance is increased proportional to the number of shards ● Lab 4: You won’t have to change what keys go into which shards. The number of shards will stay the same ● Shard Master ○ “Application” replicated by Paxos ○ Service that responds to changes in configuration (new Paxos groups being added, removed, etc) ● Configuration: ○ Similar to a view in primary/backup lab 2 ○ Specifies which groups are responsible for which shards ○ Has configuration number (monotonically increasing) ● Paxos Replica Group ○ Group of servers performing Paxos agreement with each other - just like Lab 3 ○ Handles key/value storage for assigned shards ● Shard ○ In charge of a subset of key/value pairs, ■ e.g. shard that stores all keys starting with “a” or that stores all keys that start from “a-g” ○ Shards are numbered 1....numShards Terminology Lab 4 Sharding/partitioning Peer 0 Peer 1 Peer 2 Paxos Group 0 Peer 0 Peer 1 Peer 2 Paxos Group 1 Peer 0 Peer 1 Peer 2 Paxos Group 2 Put(A, 0) Get(B) Append(C, 123) Shards vs Groups Peer 0 Peer 1 Peer 2 Paxos Group 0 Peer 0 Peer 1 Peer 2 Paxos Group 1 Peer 0 Peer 1 Peer 2 Paxos Group 2 Shard 1 (a-d) Shard 2 (e-h) Shard 3 (i-l) Shard 4 (m-p) Shard 5 (q-t) Shard 6 (u-x) Shard 7 (y-z) ShardMaster ● A service to keep track of which groups serve which shards ● Necessary because: ○ Clients need to be able to figure out what group to send requests to (i.e. which replica group is responsible for a given key) ○ We might want to reconfigure the system (inducing redistribution of shards) ■ Add/Remove Paxos Replica Group ■ Move a shard to another group (testing or, in practice, load balancing popular keys) ● Conceptually similar to the View Server in primary/backup ● Changing config is simpler than changing view in lab 2 because you can assume the “primary” is fault-tolerant ShardMaster continued ● Keeps track of a current configuration object (ShardConfig): ○ private final int configNum; ○ private final Map
, Set>> groupInfo; ■ Integer: group id ■ Set: addresses of all members in that group ■ Set: all the shard numbers the group is responsible for ● Also remembers all old configurations ○ Does not need to be garbage collected ○ Query can ask for any past configurations (see slide 16) ○ For every configuration number, want to store a configuration object like above ShardMaster Application ● ShardMaster class is an Application ● Accepts 4 command types: ○ Join ○ Leave ○ Move ○ Query ● Responds with 3 reply types: ○ Ok ○ Error ○ ShardConfig ● You’ll only need to create Query commands (test code calls others for you) Join ● The way that new replica groups are added to the system ● Join commands contain: ○ Integer for replica group ID (Must be unique, or ERROR is the result) ○ Set of server addresses that should be in the group ● ShardMaster responds by creating a new configuration ○ New config includes new shard group ○ Redistributes the shards among the updated set of groups. ■ Should move as few shards as possible ← this will be a little bit tricky. (think about vnodes) ○ Returns Ok result Leave ● Command contains: Group Id that should “leave” ● Opposite of Join: “deletes” a group from the system ● ShardMaster must redistribute the group’s shards to other groups ● Should still move as few shards as possible ● OK on success, ERROR when ○ the current config does not contain group or ○ the final group is trying to leave (not actually tested) Move ● Command contains: ○ Shard number ○ Replica Group id (which group the shard should be moved to) ● Moves a shard from one Paxos Replica Group to another Paxos Replica Group ● Practically, helpful for load balancing in the real world - operations on really hot keys perhaps should be more isolated than other keys! ● Returns Ok on success, ERROR when the current config does not contain the group Query ● Command contains: configuration number ● Should reply with ShardConfig ● Returns configuration for a specific configuration number ○ e.g. a server is outdated and needs to catch up on all the missed configurations ● If number is -1 OR larger than largest known configuration number, the ShardMaster should reply with the latest configuration. Join Example Shard 1 2 3 4 5 6 7 8 9 10 Group null null null null null null null null null null Configuration: 0 Join(1) Join Example Shard 1 2 3 4 5 6 7 8 9 10 Group 1 1 1 1 1 1 1 1 1 1 Configuration: 1 Join(2) Join Example Shard 1 2 3 4 5 6 7 8 9 10 Group 1 1 1 1 1 2 2 2 2 2 Configuration: 2 Join(5) Join Example Shard 1 2 3 4 5 6 7 8 9 10 Group 1 1 1 5 5 2 2 2 2 5 Configuration: 3 Some series of transitions occur... Leave Example Shard 1 2 3 4 5 6 7 8 9 10 Group 1 1 5 2 2 7 7 4 6 6 Configuration: 10 Leave(1) Leave Example Shard 1 2 3 4 5 6 7 8 9 10 Group 5 4 5 2 2 7 7 4 6 6 Configuration: 11 欢迎咨询51作业君