Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PSYNC3: next improvements to Redis replication #4357

Open
antirez opened this issue Oct 5, 2017 · 7 comments
Open

PSYNC3: next improvements to Redis replication #4357

antirez opened this issue Oct 5, 2017 · 7 comments

Comments

@antirez
Copy link
Contributor

antirez commented Oct 5, 2017

This issue is called PSYNC3 because, as PSYNC2 identified a set of different replication improvements, the ones described here are the next improvements planned for the Redis replication. The main focus of the improvements scribed here is:

  1. To allow AOF to also retain replication met-data like RDB does.
  2. Make Redis Sentinel and Redis Cluster failovers safe when master instances are configured without persistence, or with limited persistence (RDB).

This issue deprecates #2087, because AFAIK, this is just a better version of what proposed. PSYNC3 features are based on improvements in PSYNC2 that were not available back when protected restarts were proposed.

The single features that compose PSYNC3 are:

  1. AOF annotations with replication IDs and offsets.
  2. A Redis Sentinel and Cluster feature that marks rebooted master instances as failed, triggering a failover.
  3. A Redis replication change that allows slaves to only connect and continue the replication from instances that are successors of the same replication history.

AOF annotations

Starting with PYNC2 in Redis 4.0, RDB files are able to store replication informations such as the replication ID and offset. This allows many things, including, after a reboot, to continue the replication incrementally from the master without a full restart.

AOF should be optionally able to do the same. For every replicated command we should be able to also emit the corresponding master replication offset. Moreover when the AOF is created, if empty, and every time the replication ID changes, the new replication ID should be emitted as well.

One possibility is to instruct the AOF file, immediately after loading, that it is an annotated AOF file, via a single command like AOFCONFIG annotated 1. When this option is turned on, the first argument of every AOF command describes the replication offset after the command execution. So conceptually every command is like the following:

932434 SET foo bar
932460 INCR mykey

Moreover when the replication ID/offset change because of a new PSYNC attempt, a replication role change, or any other similar event, a REPLCONF SET-ID-AND-OFFSET is emitted in the AOF file.

Detecting rebooted instances as failing

Redis Sentinel is already able to detect reboots of Redis instances, by checking differences in the runid INFO field. Sentinel (and later Cluster) should be able to set a rebooted master as failing, so that the reboot event can trigger a failover even if the instance was not unavailable for a short time, compared to the failover unavailability trigger setting (down-after-milliseconds).

Because the replication link is often a very reliable channel to propagate writes compared to non-AOF persistence (or lack of persistence), this would result in better real world consistency of Redis instances configured with RDB or no persistence at all, since restarts will pass the master role to a slave and so forth.

However for this to work, also slaves should not connect to a rebooted master if it looks like unreliable from the point of view of the old slave history. And this leads to the third feature:

Slave ability to strictly follow history when reconnecting to the master

Slaves know the old master replication ID and the offset they are up. When strict history is configured, a slave should only accept successful PSYNC2 replies, or should accept full resynchronizations only when the full sync is needed because:

  1. The slave had no past history at all, it's a new slave.
  2. There was no replication backlog to serve the salve, but otherwise the ID matches, and the master offset is in the future.

Otherwise, if the full synchronization is triggered because the master recognizes the ID, but the offset of the master is in the past compared to what the slave is reporting (so the slave is more updated), or the master does not know the replication ID at all (trivial case, a master without persistence is rebooted), the slave should stop the synchronization attempt and retry later as usually. In the meanwhile the failover will promote a new master and the replication should be able to continue.

However, Sentinel and Redis Cluster must be able to override this setting. After a failover happens, a variant of SLAVEOF should be able to force the instance to accept the new master.

Conclusions

The implementation of the above changes should significantly improve the reliability of cluster of Redis instances in HA setups all the times AOF is not used, and should also be able to improve synchronizations times when AOF is used. This feature has currently no planned ETA and must be designed with care in all the details, so the first step is to follow up with a design document similar to the one of PSYNC2.

@soloestoy
Copy link
Collaborator

soloestoy commented Oct 9, 2017

Amazing issue! I love the improvements of PSYNC3, and I am also trying to make PSYNC work after rebooting from AOF ^_^, so just introduce my design:

  1. Make aof-use-rdb-preamble always be yes.

    At the same time, replication info should be persisted into AOF with rdb-preamble if the Redis is a slave.

  2. Slave only persist applied replication stream into AOF.

    Like replication stream, function replicationFeedSlaves() just return if the Redis is a slave, we use
    replicationFeedSlavesFromMasterStream() to propagate.

    So, we can also add a function feedAppendOnlyFileFromMasterStream() to persist Commands when the Redis is a slave, even PING and REPLCONF GETACK.

  3. Translate special commands in replication stream.

    Translate EXPIRE/PEXPIRE/EXPIREAT into PEXPIREAT.

    Translate SETEX/PSETEX to SET and PEXPIREAT.

    Translate SET [EX seconds][PX milliseconds] to SET and PEXPIREAT.

  4. Trigger BGREWRITEAOF when replication ID has been changed.

    Because we need to update replication info in AOF.

Now, I see your design, I think yours is better and also have some suggestions:

  1. Since we have recorded offset, how about recording timestamp?

    I think timestamp is useful when we want to check AOF.

  2. It is necessary to translate special commands in replication stream.

    Because the EXPIRE command may lead to inconsistency, if a slave can partial sync with master after rebooting from AOF.

Thanks

@fdingiit
Copy link

Can you make an example of the benefits of using timestamp plz? @soloestoy

@xuguruogu
Copy link

Since we have recorded offset, how about recording timestamp?
I think timestamp is useful when we want to check AOF.

This may not useful if master change replication id when restart.

@soloestoy
Copy link
Collaborator

@xuguruogu I don't think so, the timestamp is used to find when the key has been updated, it is no matter of replication id.

@0xtonyxia
Copy link
Contributor

0xtonyxia commented Jan 26, 2018

Great improvements to PSYNC2 because IMHO AOF is used quite often in production environment for data security and i also have a few suggestions to share.

  • AOFCONFIG annotated 1 is like a meta-info of every single command in AOF, so we should make it easy to extend to support future requirements. Add a version field is a good idea. And about timestamp, @xuguruogu , i agree with @soloestoy , it's useful when we troubleshoot some problems about data loss or data corruption, further more we can also implement PITR based on this. I think use meta-info in binary format is a good choice to save some space.

  • AOF annotation is more useful than just use it in the reboot situation as mentioned in the issue above. Replication backlog is relatively too small for master to accept a partially resynchronization when we face a relatively big write traffic, for example, 20MB/s. So in this situation i think we can read the incremental data the slave is lack for just from the AOF because disk space is always big enough, of course the meta-info is sent too so that the slave can get the right replication offset.

  • As for the second suggestion, because of AOF rewrite, we may not able to find the resynchronization point in AOF for slaves because an AOF rewrite will eliminate all the history information. So here i propose a new persistence method.

    • AOF rewrite is abandoned and AOF is auto rotated into multiple log files.
    • When we need an AOF rewrite, a bgsave is performed. We record the corresponding AOF log name, for example A.aof, of the newly generated RDB and AOF logs generated before A.aof can be deleted.
    • We can keep as many AOF logs as we want, it's configurable.
  • The new method sounds very aggressive ^_^. We have implemented it in ApsaraCache which is an open source project based on Redis 4.0. Very hopefully to hear your opinions @antirez .

@eduardobr
Copy link
Contributor

eduardobr commented Jul 21, 2021

I know there hasn't been much movement in this thread, but do we have updated plans or intentions of implementing in near future?

One case that seems important is for Redis Standalone setup with 1 master and multiple replicas (persistence enabled in all instances) to be able to restart master without making the replicas run Full SYNC (making all of them return LOADING status at the same time and putting down the system).
Is that possible in any scenario nowadays? For example, with some configuration to tell master that it's really is in a standalone setup and can store a replication id and safely reuse it after restarting.

@oranagra
Copy link
Member

@eduardobr this issue is discussed in #8015 and the related issue.
it seem to have been forgotten.. i'll go try to wake it up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants