-
Notifications
You must be signed in to change notification settings - Fork 40.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kube-apiserver endpoint cleanup when --apiserver-count>1 #22609
Comments
I don't believe that issue is related. When I down a master, I could see the node and pods being removed. What's not being removed is the second apiserver IP in the kubernetes endpoint. |
cc/ @mikedanese @lavalamp they might see the issue before. |
This isn't exactly a bug. This is by design but it's likely the design could be better. |
Yes, we don't expect an apiserver to go down and stay down without a replacement. |
We could make this more robust by having apiservers count themselves, e.g. by each separately making an entry in etcd somewhere. For now, if you change the number of apiservers that are running, you must change the --apiserver-count= flag by restarting all apiservers. |
What happens if one of them crashes/machines goes down? The |
This is actually an important issue that can break the high availability of the whole cluster. |
Agreed with @victorgp , we are creating a high availability cluster, and some services like DNS and Traefik do rely on the default kubernetes endpoint. I can workaround this by forcing them to use the load balancer's URL directly (basically don't use the default endpoint). But I feel like the service endpoint for kubernetes should be consistent with the rest? |
Does this still exist @ncdc , post recent mod on endpoint update? |
This issue still exists with 1.4.1. |
@timothysc yes, it still exists. We have an endpoint reconciler for the kubernetes service in OpenShift that uses etcd 2's key ttl mechanism to maintain a lease. If an apiserver goes down, one of the remaining members will remove the dead backend IP from the list of endpoints. If we can agree on a mechanism to do this in Kube (there were some concerns about etcd key ttl), I'd be happy to put together a PR. |
👍 |
This problem came up when discussing HA master design. This made me think that maybe there's a better solution. As you say we should be using a TTL for each IP. What we can do is:
This should be a very simple change and hopefully would solve this issue. WDYT? |
@fgrzadkowski that's essentially what we're doing in OpenShift here, although we use a separate path in etcd to store the data, instead of using the existing endpoints path. |
Do you think that baking the logic I described above into apiserver here would make sense? @roberthbailey @jszczepkowski @lavalamp @krousey @nikhiljindal |
Slight modification after discussion with @thockin and @jszczepkowski. We believe that a reasonable approach would be to:
That way we will have a dynamic configuration and at the same time we will not be updating Endpoints too often, as expiration times will be stored in a dedicated ConfigMap. |
I assume you'd add retries in the event that 2 apiservers tried to update On Mon, Oct 24, 2016 at 10:13 AM, Filip Grzadkowski <
|
Updating endpoints is bad, in that it fans out to the entire cluster (all kube-proxy, any network plugins, all ingress that watch endpoints, and any custom firewall controllers) every write. At 5k nodes that is a lot of traffic. Updating another resource like configmap is less bad. Least bad would be updating a resource that no one watches globally that is designed for this purpose. |
5 minutes seems really long... I'd expect an unresponsive master to get its IP pulled out of the kubernetes service endpoints way sooner (more like a 15-30 second response time). The masters contending on a single resource to heartbeat also seems like it could be problematic. |
We run 2 masters in AWS with an ASG. If we teardown one master it will take about 5-10 minutes to get a replacement, which is fine for all loadbalanced applications. If you want to have a HA Kubernetes you have to set --apiserver-count=N, where N>1, but this will make sure that the "kubernetes" endpoints will not clean up the teardowned master from above. This is not like normal loadbalancers work and I think it is much worse, then an unavailable control plane!
|
Thank you for the feedback. ProposalAdd an Populate the ConfigMap with the following:
In the reconcile endpoints loop, expire the endpoint after a configured period of time (~1 minute?) |
This session affinity causes problems if one of the API servers is down. If a client has a connection to the API server that fails, it will continue to connect to that node, because the session affinity tries to steer connections back to the failed node. (There is a related issue that causes a failed API server to never be removed from the list of valid service endpoints. See: kubernetes/kubernetes#22609)
Automatic merge from submit-queue add apiserver-count fix proposal This is a proposal to fix the apiserver-count issue at kubernetes/kubernetes#22609. I would appreciate a review on the proposal. - [x] Add ConfigMap for configurable options - [ ] Find out dependencies on the Endpoints API and add them to the proposal
…t_reconciler Automatic merge from submit-queue (batch tested with PRs 52240, 48145, 52220, 51698, 51777). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.. add lease endpoint reconciler **What this PR does / why we need it**: Adds OpenShift's LeaseEndpointReconciler to register kube-apiserver endpoints within the storage registry. Adds a command-line argument `alpha-endpoint-reconciler-type` to the kube-apiserver. Defaults to the old MasterCount reconciler. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes kubernetes/community#939 fixes kubernetes#22609 **Release note**: ```release-note Adds a command-line argument to kube-apiserver called --alpha-endpoint-reconciler-type=(master-count, lease, none) (default "master-count"). The original reconciler is 'master-count'. The 'lease' reconciler uses the storageapi and a TTL to keep alive an endpoint within the `kube-apiserver-endpoint` storage namespace. The 'none' reconciler is a noop reconciler that does not do anything. This is useful for self-hosted environments. ``` /cc @lavalamp @smarterclayton @ncdc
This session affinity causes problems if one of the API servers is down. If a client has a connection to the API server that fails, it will continue to connect to that node, because the session affinity tries to steer connections back to the failed node. (There is a related issue that causes a failed API server to never be removed from the list of valid service endpoints. See: kubernetes/kubernetes#22609)
…t-fix Automatic merge from submit-queue add apiserver-count fix proposal This is a proposal to fix the apiserver-count issue at kubernetes/kubernetes#22609. I would appreciate a review on the proposal. - [x] Add ConfigMap for configurable options - [ ] Find out dependencies on the Endpoints API and add them to the proposal
When a cluster is bootstrapped with multiple kube-apiservers, the `kubernetes` service contains a list of all of these endpoints. By default, this list of endpoints will *not* be updated if one of the apiservers goes down. This can lead to the api becoming unresponsive and breaking it. To have the endpoints automatically keep track of the apiservers that are available the `--endpoint-reconciler-type` option `lease` needs to be added. (The default option for 1.10 `master-count` only changes the endpoint when the count changes: https://github.com/apprenda/kismatic/issues/987) See: kubernetes/kubernetes#22609 kubernetes/kubernetes#56584 kubernetes/kubernetes#51698
When a cluster is bootstrapped with multiple kube-apiservers, the `kubernetes` service contains a list of all of these endpoints. By default, this list of endpoints will *not* be updated if one of the apiservers goes down. This can lead to the api becoming unresponsive and breaking it. To have the endpoints automatically keep track of the apiservers that are available the `--endpoint-reconciler-type` option `lease` needs to be added. (The default option for 1.10 `master-count` only changes the endpoint when the count changes: https://github.com/apprenda/kismatic/issues/987) See: kubernetes/kubernetes#22609 kubernetes/kubernetes#56584 kubernetes/kubernetes#51698
When a cluster is bootstrapped with multiple kube-apiservers, the `kubernetes` service contains a list of all of these endpoints. By default, this list of endpoints will *not* be updated if one of the apiservers goes down. This can lead to the api becoming unresponsive and breaking it. To have the endpoints automatically keep track of the apiservers that are available the `--endpoint-reconciler-type` option `lease` needs to be added. (The default option for 1.10 `master-count` only changes the endpoint when the count changes: https://github.com/apprenda/kismatic/issues/987) See: kubernetes/kubernetes#22609 kubernetes/kubernetes#56584 kubernetes/kubernetes#51698
Automatic merge from submit-queue add apiserver-count fix proposal This is a proposal to fix the apiserver-count issue at kubernetes/kubernetes#22609. I would appreciate a review on the proposal. - [x] Add ConfigMap for configurable options - [ ] Find out dependencies on the Endpoints API and add them to the proposal
Automatic merge from submit-queue add apiserver-count fix proposal This is a proposal to fix the apiserver-count issue at kubernetes/kubernetes#22609. I would appreciate a review on the proposal. - [x] Add ConfigMap for configurable options - [ ] Find out dependencies on the Endpoints API and add them to the proposal
Automatic merge from submit-queue add apiserver-count fix proposal This is a proposal to fix the apiserver-count issue at kubernetes/kubernetes#22609. I would appreciate a review on the proposal. - [x] Add ConfigMap for configurable options - [ ] Find out dependencies on the Endpoints API and add them to the proposal
Automatic merge from submit-queue add apiserver-count fix proposal This is a proposal to fix the apiserver-count issue at kubernetes/kubernetes#22609. I would appreciate a review on the proposal. - [x] Add ConfigMap for configurable options - [ ] Find out dependencies on the Endpoints API and add them to the proposal
Automatic merge from submit-queue add apiserver-count fix proposal This is a proposal to fix the apiserver-count issue at kubernetes/kubernetes#22609. I would appreciate a review on the proposal. - [x] Add ConfigMap for configurable options - [ ] Find out dependencies on the Endpoints API and add them to the proposal
Automatic merge from submit-queue add apiserver-count fix proposal This is a proposal to fix the apiserver-count issue at kubernetes/kubernetes#22609. I would appreciate a review on the proposal. - [x] Add ConfigMap for configurable options - [ ] Find out dependencies on the Endpoints API and add them to the proposal
Using
v1.2.0-beta.0
and running--apiserver-count=2
in my cluster. I would expect the service endpoint to be cleaned up when one of them goes offline. This doesn't happen though, and causes ~50% of apiserver requests to fail.The text was updated successfully, but these errors were encountered: