-
Notifications
You must be signed in to change notification settings - Fork 40.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcd component status check should include credentials #39716
etcd component status check should include credentials #39716
Conversation
Hi @zhouhaibing089. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
1731c14
to
1896989
Compare
squashed and removed the WIP. |
e0877af
to
523aecb
Compare
523aecb
to
a989076
Compare
a989076
to
71f60a2
Compare
This seems like a good idea but there are some things I would do differently. I'm strapped for time so maybe @mml can do an initial review? |
hey guys, may I request for someone to review this please ? I think we are blocking on this to setup a full-TLS kubernetes cluster. |
71f60a2
to
00bc625
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caching/threadsafety comments are the important ones, everything else is readability. Thanks. Sorry for the long delay.
func (server *Server) DoServerCheck(prober httpprober.HTTPProber) (probe.Result, string, error) { | ||
func (server *Server) DoServerCheck() (probe.Result, string, error) { | ||
// setup the prober | ||
if server.Prober == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you probably need a lock or a sync.Once to protect this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, updated to use sync.Once
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the server.Prober == nil
check outside the sync.Once call will still cause race issues with the assignment inside the sync.Once call
@@ -96,7 +93,7 @@ func ToConditionStatus(s probe.Result) api.ConditionStatus { | |||
} | |||
|
|||
func (rs *REST) getComponentStatus(name string, server Server) *api.ComponentStatus { | |||
status, msg, err := server.DoServerCheck(rs.prober) | |||
status, msg, err := server.DoServerCheck() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you take server
by value, it is going to have to construct a prober every time, it won't cache it. I guess you probably want to start storing pointers in the map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to storing pointers in the map.
pkg/probe/http/http.go
Outdated
@@ -36,6 +36,16 @@ func New() HTTPProber { | |||
return httpProber{transport} | |||
} | |||
|
|||
// NewWithCert creates a prober which will also setup client certificates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: what about NewWithClientCert?
for server := range servers { | ||
backends = append(backends, Backend{ | ||
Server: server, | ||
KeyFile: s.StorageConfig.KeyFile, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all etcd servers using the same client cert? Hm, I guess they must.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, all the configurations are inherited from storagebackend.Config
, if they have to use different one, we will need to update that part as well..
// for health validations | ||
type Backend struct { | ||
Server string | ||
// TLS credentials |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
document: these will be empty if it is not an HTTPS connection.
Better yet, make a type ClientCert {Key, Cert string}
and put here a ClientCert *ClientCert
, so it is clear it can be nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these will be empty if it is not an HTTPS connection.
The client certificate is only required when the server enables client-cert-auth.
There are already many places use CertFile
and KeyFile
around, like storagebackend.Config
, so I just leave it as it is..
// Backend describes the storage servers, the information here should be enough | ||
// for health validations | ||
type Backend struct { | ||
Server string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document whether this includes the scheme (http/https), or just the IP/hostname?
@@ -30,6 +30,15 @@ import ( | |||
"github.com/golang/glog" | |||
) | |||
|
|||
// Backend describes the storage servers, the information here should be enough | |||
// for health validations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add trailing period.
type Backend struct { | ||
Server string | ||
// TLS credentials | ||
KeyFile string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these really file paths rather than the contents of the key/cert files? Seems odd to pass around file paths.
etcd has support for client-cert-auth, which can be configured via the flag `--ca-file`, when that is enabled, all the client requests must present with a client certificate, however, the current component status check uses a single transport for all of the checks, this is wrong, the checks should be different for each of different component, and make each of them use different transport(tls configurations).
7209b91
to
b104017
Compare
/approve |
@liggitt mind adding the |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt, smarterclayton, zhouhaibing089
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue |
@liggitt Will this change be merged to 1.5 branch? |
no, since this is not fixing a regression, new functionality is not backported to old releases. |
Ok. So this will be available in 1.7, am i right? |
Yes |
can this fix be backported or not ? |
What is the effect of having a unhealthy output?
I can see that deploys, daemon sets are working fine but I could not be sure about this message. gentle ping @liggitt, @zhouhaibing089 and others |
@cemo hope this helps. |
Thank you so much. 👍 |
@cemo, @zhouhaibing089 the effect is that if all traffic is encrypted by mutual TLS, this prevents |
there is a bug in Kubernetes 1.6.1 that causes an error when validating the kubernetes environment and etcd. (kubernetes/kubernetes#39716) I found that using the 1.7.0 version I did not get this error. Affects the README, this file and the client configuration (moving to 1.7.0 to match)
I'm ambivalent. The componentstatus API has been broken this way since it was introduced (c.f. #17737). There are still other assumptions the componentstatuses API makes that are incorrect (that the controller-manager and scheduler are running on unsecured localhost, for example). I don't really think backporting one aspect of componentstatuses is high priority. |
* update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * update docs * remove NamespaceExists, NamespaceLifecycle covers it * use individual release binaries * use individual release binaries * Update infrastructure.md Fix typo and add comma * Update kubernetes-worker.md Fix minor typo * dry up the docs * dry up the docs * update docs * update docs * update docs * order the labs * update README.md * order the flags * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * clean up docs * Adding skyDNS objects. * add dns add-on lab * add dns add-on lab * update docs * update kubernetes controller docs * add LICENSE file * Fix cfssl mv command pathname for linux * document GCP requirements * document GCP requirements * small branding fix * Add a link to Google Cloud Platform in Infrastructure doc Some folks may not know precicely where it is without Googling, so let's add a link to save a step. * Fix links * Fixed path to token.csv * add support for aws * add support for aws * add support for aws * add screenshot * add aws support * add support for AWS * add support for aws * add support for aws * add support for aws * add support for aws * add support for aws * add support for aws * add support for aws * add support for aws * add support for aws * add support for aws * add support for AWS * add support for aws * add support for aws * add support for aws * add support for aws * add support for aws * add support for aws * add support for aws * add support for aws * add support for aws * add support for aws * add support for aws * add support for aws * add support for aws * add support for aws * doc clean up * update kubedns add-on * update kubedns add-on * update cleanup guide * explain how pods get their subnets * add details regarding the auth token * add details regarding the auth token * warn users about GCP free trial limitations * add jq requirement * document assumptions * document assumptions * update ssh login info * clean up docs * add note regarding production quality * add note regarding production quality * add note that EC2 is supported * Add Platforms Subheading When reading over the Labs subheading, it looked like there were two parts to the labs. This breaks the Labs subheading up into supported Platforms and a list of the Labs. * jq -j is not a valid option. I think the author meant jq -r * Fix typo * update to Kubernetes 1.4 * install etcd on the controller nodes * streamline deletion of GCE resources * streamline deletion of GCE resources * clean up docs * install etcd on the controller nodes * etcd blocks when using type notify * update etcd version * open traffic between same security group * remove duplication * Add config/zone cmds to the doc * Update 07-network.md * Update 01-infrastructure-gcp.md * update kube-dns add-on from v19 to v20 * Replaced deprecated flag --health-check with the new equivalent --http-health-check=kube-apiserver-check * Note that all replicas of a component must share the same certificate. * TLC => TLS * FIX #111 JQ does not work untill output format is JSON. Docs has been fixed * upgrade to Kubernetes v1.5.1 and ETCD v3.0.15 * Updating instance images for Google Cloud Engine Updating to newer images * fix kubernetes version number in README * More nuance in etcd isolation recommendation Clarify that while running etcd on a separate set of machine is a good idea, it isn't tested. * Add firewall rule to allow pods access to PodCIDR This fixes issue #88 to allow pods access to PodCIDR such as the case of DNS. When pods come up with an IP address in the cluster CIDR range, they cannot access kubedns without a firewall rule to enable it. This would also prevent pods from accessing each other depending on the application. * add kubernetes.default to kubernetes-csr.json hosts * Add permissions to nonresourcepaths to admin and kubelet * Node details to run client commands from docs/06.kubectl.md does not instruct from which node those commands must be run. The previous modules does explain that. For a newbie its helpful to tell them from which node they must run commands of doc/06-kubectl.md. This PR has a minor update about that information. * FIX #72 for LoadBalancer creation on AWS kube-controller-manager and kube-apiserver service unit files are missing one option --cloud-provider=aws parameter, which is creating problem while creating ELB on AWS when type: LoadBalancer is provided in service YAML file. This commit fixes that, issue * adding filter to aws describe-instances command to filter by the vpc-id * Make the AWS region configurable * update to Kubernetes 1.6 * use RBAC authz * clean up pod cidr firewall-rule * add authentication lab * add docs on using custom kubeconfigs * document the RBAC role binding process for TLS bootstrapping * document the CSR approve flow * make kube-dns work again * update for 1.6 * update to Kubernetes 1.6 * update to Kubernetes 1.6 * update to Kubernetes 1.6 * update to Kubernetes 1.6 * update to Kubernetes 1.6 * update to Kubernetes 1.6 * update to Kubernetes 1.6 * update to Kubernetes 1.6 * remove AWS support * remove AWS support * remove AWS support * doc clean up and basic formatting improvements * fix broken link * fix logging link * clean up docs * clean up formatting * Fix typo * Remove quotes from audit log, signing key args * fix formatting of bootstrap token generation * Update kubectl to 1.6.0 release Kubectl 1.6.0 has now been fully released https://github.com/kubernetes/kubernetes/releases/tag/v1.6.0 * Move certificate and private-key away from /var/run It is cleared out at reboot. It appears that only the file-name part of --tls-cert-file / --tls-private-key-file is used and that the path is taken from --cert-dir (which defaults to /var/run/kubernetes) so to make the path stick we also add a --cert-dir * Add Authentication to etcd configuration Added two flags needed to activate client and peer authentication in etcd * Fix typo in 03-auth-configs.md * Update 01-infrastructure-gcp.md, fix #153 * Add missing backslash * Bump kubernetes to v1.6.1 * Use gcloud compute scp instead of copy-files * Update 05-kubernetes-controller.md Unless the region is explicitly passed, I get the error: ``` ERROR: (gcloud.compute.target-pools.create) Some requests did not succeed: - Invalid value for field 'region': 'us-central1-b'. Unknown region. ``` * Update range for health check This new IP range is listed on the following docs: https://cloud.google.com/compute/docs/load-balancing/health-checks https://cloud.google.com/compute/docs/load-balancing/http/ * Add missing additional IP range in firewall validation output * Update README to reflect Kubernetes 1.6.1 update * Grammar fix for 06-kubernetes-worker.md * Update 07-kubectl.md as per my other pull request, using 1.7.0 fixes an error with etcd configuration on the Kubernetes cluster. * Update README.md as per my other two requests, the 1.6.1 Kubernetes had a defect that was corrected in 1.7.0. I ran your code seamlessly using 1.7.0 today. * Update 05-kubernetes-controller.md there is a bug in Kubernetes 1.6.1 that causes an error when validating the kubernetes environment and etcd. (kubernetes/kubernetes#39716) I found that using the 1.7.0 version I did not get this error. Affects the README, this file and the client configuration (moving to 1.7.0 to match) * Enable kubectl portfw on workers, fixes #78 * Update docker.com url * Update kubectl to 1.7.0 * fix KUBERNETES_PUBLIC_ADDRESS value Instead of 'value(address)' 'value(name)' should be used to make the following forwarding-rules creation command to work. With 'value(address)' it returns an error: "ERROR: (gcloud.compute.forwarding-rules.create) Could not fetch resource: - The resource 'projects/some-random-project/regions/us-central1/addresses/w.x.y.z' was not found" * update docs * remove trailing-spaces and blank lines * remove trailing space * 226: use curl for OSX downloads * Instructions for having a default configuration. * Fixed '--service-cluster-ip-range' subnet for Controller Manager * update to kubernetes 1.8 * remove remote access to insecure port * remove remote access to insecure port * update to kubernetes 1.9 * rename to Google Kubernetes Engine * update kubernetes GitHub location * update kubelet flags * Update cri-containerd location and release cri-containerd project is moving as a sub-project of containerd itself; the GitHub migration is complete. Also, as of 9 Jan, the beta.1 release of cri-containerd is available so the download link is updated. Signed-off-by: Phil Estes <estesp@gmail.com> * Correct kubectl output for kube-dns pod list https://storage.googleapis.com/kubernetes-the-hard-way/kube-dns.yaml will create 1 kube-dns pod without autoscaler. This may confuse the reader. * core workload api incl. Deployment now stable * Argument syntax for Filter deprecated. Operator evaluation is changing for consistency across Google APIs. * Fix small typo * Fix minor typo in lab 14 * Add brief contribution guide * Update to Kubernetes 1.10.2 and add gVisor support * Update to Kubernetes 1.12.0 and add CoreDNS support * Update to Kubernetes 1.15.3 * Update to Kubernetes 1.18.6 * Update to Kubernetes 1.21.0 * Add Hosts File entry and remove two stray merge markers Co-authored-by: Kelsey Hightower <kelsey.hightower@gmail.com> Co-authored-by: Adrian Mouat <adrian.mouat@gmail.com> Co-authored-by: Elson Rodriguez <elson.rodriguez@gmail.com> Co-authored-by: warren vosper <straydogsw@gmail.com> Co-authored-by: craigbox <craig.box@gmail.com> Co-authored-by: Lisa Seelye <lisa@thedoh.com> Co-authored-by: Timo Sugliani <timo.sugliani@gmail.com> Co-authored-by: Taariq Levack <taariql@gmail.com> Co-authored-by: Watkins <victor.watkins@hp.com> Co-authored-by: GGC <gopichaithanya@hotmail.com> Co-authored-by: thejsj <jorge.silva@thejsj.com> Co-authored-by: keglevich3 <keglevich2@gmail.com> Co-authored-by: Joe Intrakamhang <joe_int@hotmail.com> Co-authored-by: Waldemar Quevedo <wally@apcera.com> Co-authored-by: Chris Kim <christopherjkim@gmail.com> Co-authored-by: Thor Wolpert <thor@wolpert.ca> Co-authored-by: Chris Jones <christian.jones@sri.com> Co-authored-by: Danny Kansas <dan@logickc.com> Co-authored-by: ksingh7 <karan.singh731987@gmail.com> Co-authored-by: Valentin Ouvrard <valentin.ouvrard@nautile.sarl> Co-authored-by: Marcelo <marcelosm@gmail.com> Co-authored-by: AUG <agabet@octo.com> Co-authored-by: Justin Santa Barbara <justin@fathomdb.com> Co-authored-by: Ivan Font <ifont@redhat.com> Co-authored-by: Noah Dietz <noahdietz24@gmail.com> Co-authored-by: Javi Polo <javipolo@trovit.com> Co-authored-by: Jason Price <jprice@pindropsecurity.com> Co-authored-by: Balazs Rauznitz <rau.colorado@gmail.com> Co-authored-by: Mike Rostermund <mike@kollegienet.dk> Co-authored-by: Jesse Newland <jesse@github.com> Co-authored-by: Tom Payne <twpayne@gmail.com> Co-authored-by: arjunyel <arjunyel@users.noreply.github.com> Co-authored-by: Mads H. Danquah <mads@danquah.dk> Co-authored-by: Rory McCune <raesene@gmail.com> Co-authored-by: David Castillo <castillobgr@gmail.com> Co-authored-by: David Castillo <castillobgr@users.noreply.github.com> Co-authored-by: Alan Hollis <me@alanhollis.com> Co-authored-by: Michael McClanahan <michael.j.mcclanahan@gmail.com> Co-authored-by: Tennis Smith <gamename@users.noreply.github.com> Co-authored-by: Luis Buriola <gburiola@gmail.com> Co-authored-by: Ádám Sándor <adam-sandor@users.noreply.github.com> Co-authored-by: Christian Koep <christiankoep@gmail.com> Co-authored-by: Glenn Oppegard <oppegard@gmail.com> Co-authored-by: David Ross <technoross@fastmail.com> Co-authored-by: Abraham Ingersoll <aberoham@users.noreply.github.com> Co-authored-by: Emilian Losneanu <mercer.traieste@gmail.com> Co-authored-by: Adriaan de Jonge <adriaandejonge@gmail.com> Co-authored-by: Niko Virtala <nikovirtala@users.noreply.github.com> Co-authored-by: Leonardo Faoro <lfaoro@me.com> Co-authored-by: Frank Ederveen <frank@crystalconsulting.eu> Co-authored-by: Kalimar Maia <kmaia@customink.com> Co-authored-by: Steven Trescinski <trescst@gmail.com> Co-authored-by: Phil Estes <estesp@gmail.com> Co-authored-by: Andrew Lytvynov <lytvynov.a.v@gmail.com> Co-authored-by: githubixx <home@tauceti.net> Co-authored-by: Andrew Thompson <admin@entercloud.info> Co-authored-by: Mark Vincze <mrk.vincze@gmail.com> Co-authored-by: Martin Mosegaard Amdisen <martin.amdisen@praqma.com> Co-authored-by: Daniel J. Pritchett <dpritchett@gmail.com>
pkg/genericapiserver.Backend
.pkg/registry/core/componentstatus.Server
.pkg/probe/http.httpProber
should accept the TLS credentials.Now it is working.
Fixes #27343.