Skip to content

kubeadm blocks waiting for 'control plane' #33544

Closed
@sebgoa

Description

@sebgoa
Contributor

Hi @kubernetes/sig-cluster-lifecycle

I tried to follow the docs for kubeadm on centOS 7.1.

It seems that the kubeadm init blocks waiting for 'control plane to become ready' even though all containers are running.

# kubeadm init --token foobar.1234
<util/tokens> validating provided token
<master/tokens> accepted provided token
<master/pki> created keys and certificates in "/etc/kubernetes/pki"
<util/kubeconfig> created "/etc/kubernetes/kubelet.conf"
<util/kubeconfig> created "/etc/kubernetes/admin.conf"
<master/apiclient> created API client configuration
<master/apiclient> created API client, waiting for the control plane to become ready

here are the running containers on the same master machine:

$ sudo docker ps
CONTAINER ID        IMAGE                                                           COMMAND                  CREATED             STATUS              PORTS               NAMES
30aff4f98753        gcr.io/google_containers/kube-apiserver-amd64:v1.4.0            "/usr/local/bin/kube-"   3 minutes ago       Up 3 minutes                            k8s_kube-apiserver.c44dda3f_kube-apiserver-k8ss-head_kube-system_6b83c87a9bf5c380c6f948f428b23dd1_408af885
8fd1842776ab        gcr.io/google_containers/kube-controller-manager-amd64:v1.4.0   "/usr/local/bin/kube-"   3 minutes ago       Up 3 minutes                            k8s_kube-controller-manager.a2978680_kube-controller-manager-k8ss-head_kube-system_5f805ed49f6fd9f0640be470e3dea2a2_7ac41d83
32b7bfb55dc0        gcr.io/google_containers/kube-scheduler-amd64:v1.4.0            "/usr/local/bin/kube-"   3 minutes ago       Up 3 minutes                            k8s_kube-scheduler.1b5cde04_kube-scheduler-k8ss-head_kube-system_586d16be4ecaac95b0162c5d11921019_0ca14012
8a1797fdb1df        gcr.io/google_containers/etcd-amd64:2.2.5                       "etcd --listen-client"   8 minutes ago       Up 8 minutes                            k8s_etcd.4ffa9846_etcd-k8ss-head_kube-system_42857e4bd57d261fc438bcb2a87572b9_f1b219d3
292bcafb3316        gcr.io/google_containers/pause-amd64:3.0                        "/pause"                 8 minutes ago       Up 8 minutes                            k8s_POD.d8dbe16c_kube-controller-manager-k8ss-head_kube-system_5f805ed49f6fd9f0640be470e3dea2a2_fe9592ab
ab929dd920a2        gcr.io/google_containers/pause-amd64:3.0                        "/pause"                 8 minutes ago       Up 8 minutes                            k8s_POD.d8dbe16c_kube-apiserver-k8ss-head_kube-system_6b83c87a9bf5c380c6f948f428b23dd1_c93e3a3b
71c28763aeab        gcr.io/google_containers/pause-amd64:3.0                        "/pause"                 8 minutes ago       Up 8 minutes                            k8s_POD.d8dbe16c_kube-scheduler-k8ss-head_kube-system_586d16be4ecaac95b0162c5d11921019_eb12a865
615cb42e0108        gcr.io/google_containers/pause-amd64:3.0                        "/pause"                 8 minutes ago       Up 8 minutes                            k8s_POD.d8dbe16c_etcd-k8ss-head_kube-system_42857e4bd57d261fc438bcb2a87572b9_891fc5db

I tried to join a node but I get a connection refused error, even though there is no firewall...

# kubeadm join --token foobar.1234 <master_ip>
<util/tokens> validating provided token
<node/discovery> created cluster info discovery client, requesting info from "http://185.19.30.178:9898/cluster-info/v1/?token-id=foobar"
error: <node/discovery> failed to request cluster info [Get http://MASTER_IP:9898/cluster-info/v1/?token-id=foobar: dial tcp MASTER_IP:9898: getsockopt: connection refused]

and now I am actually wondering if the init is blocking waiting for nodes to join. According to the docs it is not blocking, but the logs of kubeadm seems to indicate that it is.

Activity

DaspawnW

DaspawnW commented on Sep 27, 2016

@DaspawnW

Same issue for me on an aws installation, but I can't see any docker containers running.
Some Informations are here:
Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-38-generic x86_64)
Using http_proxy and https_proxy

export https_proxy=http://<proxy>:<port>
export http_proxy=http://<proxy>:<port>
kubeadm init --cloud-provider aws

Looked at the logs of the apiServer. It returns with an exception:

I0927 11:44:47.425374       1 handlers.go:162] GET /api/v1/namespaces/kube-system/endpoints/kube-scheduler: (793.43µs) 200 [[kube-scheduler/v1.4.0 (linux/amd64) kubernetes/a16c0a7] 127.0.0.1:46848]
I0927 11:44:47.427858       1 handlers.go:162] PUT /api/v1/namespaces/kube-system/endpoints/kube-scheduler: (1.682203ms) 200 [[kube-scheduler/v1.4.0 (linux/amd64) kubernetes/a16c0a7] 127.0.0.1:46848]
I0927 11:44:47.606685       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57328: remote error: bad certificate
I0927 11:44:47.722809       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57330: remote error: bad certificate
I0927 11:44:47.728099       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57332: remote error: bad certificate
I0927 11:44:48.251368       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57334: remote error: bad certificate
I0927 11:44:48.256871       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57336: remote error: bad certificate
I0927 11:44:48.262479       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57338: remote error: bad certificate
I0927 11:44:48.267460       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57340: remote error: bad certificate
I0927 11:44:48.608406       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57342: remote error: bad certificate
I0927 11:44:48.724428       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57344: remote error: bad certificate
I0927 11:44:48.729680       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57346: remote error: bad certificate
I0927 11:44:48.777612       1 handlers.go:162] GET /healthz: (39.187µs) 200 [[Go-http-client/1.1] 127.0.0.1:49808]
I0927 11:44:49.429761       1 handlers.go:162] G10.10.10.10ET /api/v1/namespaces/kube-system/endpoints/kube-scheduler: (762.498µs) 200 [[kube-scheduler/v1.4.0 (linux/amd64) kubernetes/a16c0a7] 127.0.0.1:46848]
I0927 11:44:49.432267       1 handlers.go:162] PUT /api/v1/namespaces/kube-system/endpoints/kube-scheduler: (2.070905ms) 200 [[kube-scheduler/v1.4.0 (linux/amd64) kubernetes/a16c0a7] 127.0.0.1:46848]
I0927 11:44:49.614084       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57354: remote error: bad certificate
I0927 11:44:49.727405       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57356: remote error: bad certificate
I0927 11:44:49.732888       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57358: remote error: bad certificate
I0927 11:44:50.080279       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57360: remote error: bad certificate
I0927 11:44:50.085570       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57362: remote error: bad certificate
I0927 11:44:50.617384       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57364: remote error: bad certificate
I0927 11:44:50.730144       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57366: remote error: bad certificate
I0927 11:44:50.735525       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57368: remote error: bad certificate
I0927 11:44:51.433824       1 handlers.go:162] GET /api/v1/namespaces/kube-system/endpoints/kube-scheduler: (769.066µs) 200 [[kube-scheduler/v1.4.0 (linux/amd64) kubernetes/a16c0a7] 127.0.0.1:46848]
I0927 11:44:51.436359       1 handlers.go:162] PUT /api/v1/namespaces/kube-system/endpoints/kube-scheduler: (1.713977ms) 200 [[kube-scheduler/v1.4.0 (linux/amd64) kubernetes/a16c0a7] 127.0.0.1:46848]
I0927 11:44:51.620964       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57370: remote error: bad certificate
I0927 11:44:51.731724       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57372: remote error: bad certificate
I0927 11:44:51.761983       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57374: remote error: bad certificate
I0927 11:44:52.622487       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57376: remote error: bad certificate
I0927 11:44:52.732927       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57378: remote error: bad certificate
I0927 11:44:52.762908       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57380: remote error: bad certificate
I0927 11:44:53.438270       1 handlers.go:162] GET /api/v1/namespaces/kube-system/endpoints/kube-scheduler: (805.346µs) 200 [[kube-scheduler/v1.4.0 (linux/amd64) kubernetes/a16c0a7] 127.0.0.1:46848]
I0927 11:44:53.440909       1 handlers.go:162] PUT /api/v1/namespaces/kube-system/endpoints/kube-scheduler: (1.82773ms) 200 [[kube-scheduler/v1.4.0 (linux/amd64) kubernetes/a16c0a7] 127.0.0.1:46848]
I0927 11:44:53.627293       1 logs.go:41] http: TLS handshake error from 10.10.10.10:57382: remote error: bad certificate
yoojinl

yoojinl commented on Sep 27, 2016

@yoojinl

@sebgoa Looks similar to #33541, do you have SELinux enabled?
Try to run docker ps -a | grep discovery, get id of kube-discovery container run docker logs <id> to see, if there is permissions denied error for /tmp/secret directory.

oz123

oz123 commented on Sep 27, 2016

@oz123
Contributor

@RustyRobot , disabling SELINUX on Ubuntu 16.04 does solve the problem of hanging.

added a commit that references this issue on Sep 27, 2016
sebgoa

sebgoa commented on Sep 27, 2016

@sebgoa
ContributorAuthor

Ok disabling selinux got me further, the kubeadm init finished. But now there is nothing listening on 9898.

What component is supposed to be listening on that port for cluster joins ?

sebgoa

sebgoa commented on Sep 27, 2016

@sebgoa
ContributorAuthor

ok so the discover port is using a hostPort on 9898.

logs on that pod return this:

$ kubectl logs kube-discovery-1971138125-yry3x --namespace=kube-system
Error from server: Get https://kube-head:10250/containerLogs/kube-system/kube-discovery-1971138125-yry3x/kube-discovery: dial tcp: lookup kube-head on 8.8.8.8:53: no such host

I am following the docs

The DNS pod is not starting:

Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Type        Reason      Message
  --------- --------    -----   ----            -------------   --------    ------      -------
  27m       27m     1   {default-scheduler }            Normal      Scheduled   Successfully assigned kube-dns-2247936740-igptf to kube-head
  27m       3s      662 {kubelet kube-head}         Warning     FailedSync  Error syncing pod, skipping: failed to "SetupNetwork" for "kube-dns-2247936740-igptf_kube-system" with SetupNetworkError: "Failed to setup network for pod \"kube-dns-2247936740-igptf_kube-system(00cf8b74-84c2-11e6-9dfa-061eca000139)\" using network plugins \"cni\": cni config unintialized; Skipping pod"
sebgoa

sebgoa commented on Sep 27, 2016

@sebgoa
ContributorAuthor

@errordeveloper looks like this might be right up your alley..

lukemarsden

lukemarsden commented on Sep 27, 2016

@lukemarsden
Contributor

@sebgoa can you try starting from scratch, following the instructions at http://deploy-preview-1321.kubernetes-io-vnext-staging.netlify.com/docs/getting-started-guides/kubeadm/ please?

oz123

oz123 commented on Sep 27, 2016

@oz123
Contributor

@lukemarsden, I followed the instructions you posted, and it seems that systemd is immediately starting kubelet:

# apt-get install -y kubelet kubeadm kubectl kubernetes-cni
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  kubeadm kubectl kubelet kubernetes-cni
0 upgraded, 4 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B/40.9 MB of archives.
After this operation, 328 MB of additional disk space will be used.
Selecting previously unselected package kubernetes-cni.
(Reading database ... 69111 files and directories currently installed.)
Preparing to unpack .../kubernetes-cni_0.3.0.1-07a8a2-00_amd64.deb ...
Unpacking kubernetes-cni (0.3.0.1-07a8a2-00) ...
Selecting previously unselected package kubelet.
Preparing to unpack .../kubelet_1.4.0-00_amd64.deb ...
Unpacking kubelet (1.4.0-00) ...
Selecting previously unselected package kubectl.
Preparing to unpack .../kubectl_1.4.0-00_amd64.deb ...
Unpacking kubectl (1.4.0-00) ...
Selecting previously unselected package kubeadm.
Preparing to unpack .../kubeadm_1.5.0-alpha.0-1495-g1e7fa1f-00_amd64.deb ...
Unpacking kubeadm (1.5.0-alpha.0-1495-g1e7fa1f-00) ...
Setting up kubernetes-cni (0.3.0.1-07a8a2-00) ...
Setting up kubelet (1.4.0-00) ...
Setting up kubectl (1.4.0-00) ...
Setting up kubeadm (1.5.0-alpha.0-1495-g1e7fa1f-00) ...
root@saltmaster:/home/vagrant# kubeadm init --api-advertise-addresses 172.16.80.80
<master/tokens> generated token: "99f2d4.26fdd8fe96143456"
<master/pki> created keys and certificates in "/etc/kubernetes/pki"
error: <util/kubeconfig> failed to create "/etc/kubernetes/kubelet.conf", it already exists [open /etc/kubernetes/kubelet.conf: file exists]
yoojinl

yoojinl commented on Sep 27, 2016

@yoojinl

@oz123 If you already did kubeadm init, you need to start from scratch i.e. remove /etc/kuberentes and /var/lib/etcd directories. We have plans to introduce --reset flag in the future, in order to do it automatically.

benmathews

benmathews commented on Sep 27, 2016

@benmathews

There is a uninstall script referenced at http://deploy-preview-1321.kubernetes-io-vnext-staging.netlify.com/docs/getting-started-guides/kubeadm/. After running it, my init ran correctly again.

systemctl stop kubelet;
docker rm -f $(docker ps -q); mount | grep "/var/lib/kubelet/*" | awk '{print $3}' | xargs umount 1>/dev/null 2>/dev/null;
rm -rf /var/lib/kubelet /etc/kubernetes /var/lib/etcd /etc/cni;
ip link set cbr0 down; ip link del cbr0;
ip link set cni0 down; ip link del cni0;
systemctl start kubelet
errordeveloper

errordeveloper commented on Sep 27, 2016

@errordeveloper
Member

I think this can be closed now, as soon as new packages become available.

On Tue, 27 Sep 2016, 18:33 Ben Mathews, notifications@github.com wrote:

There is a uninstall script referenced at
http://deploy-preview-1321.kubernetes-io-vnext-staging.netlify.com/docs/getting-started-guides/kubeadm/.
After running it, my init ran correctly again.

systemctl stop kubelet;
docker rm -f $(docker ps -q); mount | grep "/var/lib/kubelet/*" | awk '{print $3}' | xargs umount 1>/dev/null 2>/dev/null;
rm -rf /var/lib/kubelet /etc/kubernetes /var/lib/etcd /etc/cni;
ip link set cbr0 down; ip link del cbr0;
ip link set cni0 down; ip link del cni0;
systemctl start kubelet


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#33544 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAPWS6jj3IYKa-JLyYAyuvCarg_G8xgKks5quVNRgaJpZM4KHcRI
.

52 remaining items

kenzhaoyihui

kenzhaoyihui commented on Jan 3, 2017

@kenzhaoyihui

@Dmitry1987 Yeah,Thank for your help.I will check the log again.

krishvoor

krishvoor commented on Jan 14, 2017

@krishvoor

Facing the same issue as well ..
SELINUX is disabled.
OS : Ubuntu 1604
ARCH : ppc64le
iptables/firewalld : disabled
Natively compiled Kubernetes (release 1.5) and tried " kubeadm init ". Hangs here :-

[kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters.
[preflight] Running pre-flight checks
[preflight] WARNING: kubelet service does not exist
[init] Using Kubernetes version: v1.5.2
[certificates] Generated Certificate Authority key and certificate.
[certificates] Generated API Server key and certificate
[certificates] Generated Service Account signing keys
[certificates] Created keys and certificates in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[apiclient] Created API client, waiting for the control plane to become ready

kenzhaoyihui

kenzhaoyihui commented on Jan 14, 2017

@kenzhaoyihui

@harsha544 Do you attach the log about /var/log/messages and docker images?

krishvoor

krishvoor commented on Jan 14, 2017

@krishvoor

@kenzhaoyihui Nothing in /var/log/syslog . Enclosing docker images output

docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

kenzhaoyihui

kenzhaoyihui commented on Jan 14, 2017

@kenzhaoyihui

@harsha544 https://github.com/kenzhaoyihui/kubeadm-images-gcr.io/blob/master/pull_kubernetes_images.sh

The shell script is to pull the all docker images that needed, could you pull all the images and then execute "kubeadm init".

krishvoor

krishvoor commented on Jan 15, 2017

@krishvoor

@kenzhaoyihui Thanks for the URL, I tweaked the respective script to pull ppc64le docker images, however not all docker images appears to be present.

ozbillwang

ozbillwang commented on Jan 15, 2017

@ozbillwang

@harsha544

Be careful the link and script which @kenzhaoyihui provided, that script tries to fake the google's images with his own images.

You'd better to not run with it.

In fact the solution has been provided in this ticket, I fixed my issue already. It was provided by @benmathews commented on Sep 28, 2016. If yo missed that comment, you should take a try.

krishvoor

krishvoor commented on Jan 17, 2017

@krishvoor

@SydOps I was cautious enough to pull docker images from gcr.io/google_containers/ppc64le..
Given that this is a ppc64le ARCH, I built the binaries from the GitHub Source, and hence I don't have the feasibility to restart via systemctl .. However I'm following the approach of Kubernetes Ansible (https://github.com/kubernetes/contrib/tree/master/ansible) To deploy the K8 cluster among my nodes.

luxas

luxas commented on Jan 17, 2017

@luxas
Member

@harsha544 Please open a new issue in kubernetes/kubeadm about this.
It's fully possible to solve it, but requires some manual hacking until we've got ppc64le mainline again, see: #38926

mohamedbouchriha

mohamedbouchriha commented on Feb 20, 2017

@mohamedbouchriha

thanks @saidiahd It works for me

shufanhao

shufanhao commented on Apr 30, 2017

@shufanhao
Contributor

I also hit this issues. have disabled SELINUX.

[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
[apiclient] Created API client, waiting for the control plane to become ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @miry@errordeveloper@lukemarsden@benmathews@yoojinl

        Issue actions

          kubeadm blocks waiting for 'control plane' · Issue #33544 · kubernetes/kubernetes