New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PSYNC3: next improvements to Redis replication #4357
Comments
Amazing issue! I love the improvements of PSYNC3, and I am also trying to make PSYNC work after rebooting from AOF ^_^, so just introduce my design:
Now, I see your design, I think yours is better and also have some suggestions:
Thanks |
Can you make an example of the benefits of using timestamp plz? @soloestoy |
This may not useful if master change replication id when restart. |
@xuguruogu I don't think so, the timestamp is used to find when the key has been updated, it is no matter of replication id. |
Great improvements to PSYNC2 because IMHO AOF is used quite often in production environment for data security and i also have a few suggestions to share.
|
I know there hasn't been much movement in this thread, but do we have updated plans or intentions of implementing in near future? One case that seems important is for Redis Standalone setup with 1 master and multiple replicas (persistence enabled in all instances) to be able to restart master without making the replicas run Full SYNC (making all of them return LOADING status at the same time and putting down the system). |
@eduardobr this issue is discussed in #8015 and the related issue. |
This issue is called PSYNC3 because, as PSYNC2 identified a set of different replication improvements, the ones described here are the next improvements planned for the Redis replication. The main focus of the improvements scribed here is:
This issue deprecates #2087, because AFAIK, this is just a better version of what proposed. PSYNC3 features are based on improvements in PSYNC2 that were not available back when protected restarts were proposed.
The single features that compose PSYNC3 are:
AOF annotations
Starting with PYNC2 in Redis 4.0, RDB files are able to store replication informations such as the replication ID and offset. This allows many things, including, after a reboot, to continue the replication incrementally from the master without a full restart.
AOF should be optionally able to do the same. For every replicated command we should be able to also emit the corresponding master replication offset. Moreover when the AOF is created, if empty, and every time the replication ID changes, the new replication ID should be emitted as well.
One possibility is to instruct the AOF file, immediately after loading, that it is an annotated AOF file, via a single command like
AOFCONFIG annotated 1
. When this option is turned on, the first argument of every AOF command describes the replication offset after the command execution. So conceptually every command is like the following:Moreover when the replication ID/offset change because of a new PSYNC attempt, a replication role change, or any other similar event, a
REPLCONF SET-ID-AND-OFFSET
is emitted in the AOF file.Detecting rebooted instances as failing
Redis Sentinel is already able to detect reboots of Redis instances, by checking differences in the
runid
INFO field. Sentinel (and later Cluster) should be able to set a rebooted master as failing, so that the reboot event can trigger a failover even if the instance was not unavailable for a short time, compared to the failover unavailability trigger setting (down-after-milliseconds).Because the replication link is often a very reliable channel to propagate writes compared to non-AOF persistence (or lack of persistence), this would result in better real world consistency of Redis instances configured with RDB or no persistence at all, since restarts will pass the master role to a slave and so forth.
However for this to work, also slaves should not connect to a rebooted master if it looks like unreliable from the point of view of the old slave history. And this leads to the third feature:
Slave ability to strictly follow history when reconnecting to the master
Slaves know the old master replication ID and the offset they are up. When strict history is configured, a slave should only accept successful PSYNC2 replies, or should accept full resynchronizations only when the full sync is needed because:
Otherwise, if the full synchronization is triggered because the master recognizes the ID, but the offset of the master is in the past compared to what the slave is reporting (so the slave is more updated), or the master does not know the replication ID at all (trivial case, a master without persistence is rebooted), the slave should stop the synchronization attempt and retry later as usually. In the meanwhile the failover will promote a new master and the replication should be able to continue.
However, Sentinel and Redis Cluster must be able to override this setting. After a failover happens, a variant of SLAVEOF should be able to force the instance to accept the new master.
Conclusions
The implementation of the above changes should significantly improve the reliability of cluster of Redis instances in HA setups all the times AOF is not used, and should also be able to improve synchronizations times when AOF is used. This feature has currently no planned ETA and must be designed with care in all the details, so the first step is to follow up with a design document similar to the one of PSYNC2.
The text was updated successfully, but these errors were encountered: