RegionServer Failover
When no RegionServers are failing, keeping track of the logs in ZooKeeper adds no value. Unfortunately, RegionServers do fail, and since ZooKeeper is highly available, it is useful for managing the transfer of the queues in the event of a failure.
Each of the master cluster RegionServers keeps a watcher on every
          other RegionServer, in order to be notified when one becomes unavailable just as the
          master does. When a failure happens, they all race to create a znode called
            lock inside the unavailable RegionServer znode that contains its
          queues. The RegionServer that creates it successfully then transfers all the queues to
          its own znode, one at a time since ZooKeeper does not support renaming queues. After all
          queues are transferred, they are deleted from the old location. The recovered znodes are
          then renamed with the slave cluster ID appended to the name of the server that
          failed.
Next, the master cluster RegionServer creates one new source thread per copied queue. Each of the source threads follows the 'read/filter/ship pattern.' Those queues never receive new data because they do not belong to their new RegionServer. When the reader hits the end of the last log, the queue znode is deleted and the master cluster RegionServer closes that replication source.
For example, the following hierarchy represents what the znodes layout
          might be for a master cluster with 3 RegionServers that are replicating to a single slave
          with the ID of 2. The RegionServer znodes contain a
            peers znode that contains a single queue. The znode names in the
          queues represent the actual file names on HDFS in the form
            address,port.timestamp:
/hbase/replication/rs/
  1.1.1.1,60020,123456780/
    2/
      1.1.1.1,60020.1234  (Contains a position)
      1.1.1.1,60020.1265
  1.1.1.2,60020,123456790/
    2/
      1.1.1.2,60020.1214  (Contains a position)
      1.1.1.2,60020.1248
      1.1.1.2,60020.1312
  1.1.1.3,60020,    123456630/
    2/
      1.1.1.3,60020.1280  (Contains a position)
      Assume that 1.1.1.2 loses its ZooKeeper session. The survivors race to create a lock, and, arbitrarily, 1.1.1.3 wins. It then starts transferring all the queues to the znode of its local peers by appending the name of the server that failed. Right before 1.1.1.3 is able to clean up the old znodes, the layout looks like the following:
/hbase/replication/rs/
  1.1.1.1,60020,123456780/
    2/
      1.1.1.1,60020.1234  (Contains a position)
      1.1.1.1,60020.1265
  1.1.1.2,60020,123456790/
    lock
    2/
      1.1.1.2,60020.1214  (Contains a position)
      1.1.1.2,60020.1248
      1.1.1.2,60020.1312
  1.1.1.3,60020,123456630/
    2/
      1.1.1.3,60020.1280  (Contains a position)
    2-1.1.1.2,60020,123456790/
      1.1.1.2,60020.1214  (Contains a position)
      1.1.1.2,60020.1248
      1.1.1.2,60020.1312
      Some time later, but before 1.1.1.3 is able to finish replicating the last WAL from 1.1.1.2, it also becomes unavailable. Some new logs were also created in the normal queues. The last RegionServer then tries to lock 1.1.1.3’s znode and begins transferring all the queues. Then the new layout is:
/hbase/replication/rs/
  1.1.1.1,60020,123456780/
    2/
      1.1.1.1,60020.1378  (Contains a position)
    2-1.1.1.3,60020,123456630/
      1.1.1.3,60020.1325  (Contains a position)
      1.1.1.3,60020.1401
    2-1.1.1.2,60020,123456790-1.1.1.3,60020,123456630/
      1.1.1.2,60020.1312  (Contains a position)
  1.1.1.3,60020,123456630/
    lock
    2/
      1.1.1.3,60020.1325  (Contains a position)
      1.1.1.3,60020.1401
    2-1.1.1.2,60020,123456790/
      1.1.1.2,60020.1312  (Contains a position)
   
