Description
Hello kubernetes,
I am trying to follow the instructions from the rbd example. After successfully booting a ceph demo cluster (sudo ceph -s
on the host displays HEALTH_OK
) and manually creating a foo
rdb volume formatted in ext4, I cannot start any pod that uses rdb volumes.
The rdb2
pod never starts, it stays in ContainerCreating
state, as shown by kubectl get pod
output below:
NAME READY STATUS RESTARTS AGE
k8s-etcd-127.0.0.1 1/1 Running 0 49m
k8s-master-127.0.0.1 4/4 Running 4 50m
k8s-proxy-127.0.0.1 1/1 Running 0 49m
rbd2 0/1 ContainerCreating 0 9m
I am using kubernetes 1.2.1 with docker 1.9.1 on ubuntu 14.04 amd64 host using the single-node docker cluster.
The output of kubectl describe pods rbd2
is the following:
Name: rbd2
Namespace: default
Node: 127.0.0.1/127.0.0.1
Start Time: Wed, 06 Apr 2016 18:38:22 +0200
Labels: <none>
Status: Pending
IP:
Controllers: <none>
Containers:
rbd-rw:
Container ID:
Image: nginx
Image ID:
Port:
QoS Tier:
cpu: BestEffort
memory: BestEffort
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Ready False
Volumes:
rbdpd:
Type: RBD (a Rados Block Device mount on the host that shares a pod's lifetime)
CephMonitors: [172.17.42.1:6789]
RBDImage: foo
FSType: ext4
RBDPool: rbd
RadosUser: admin
Keyring: /etc/ceph/ceph.client.admin.keyring
SecretRef: &{ceph-secret}
ReadOnly: true
default-token-1ze78:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-1ze78
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
7m 7m 1 {default-scheduler } Normal Scheduled Successfully assigned rbd2 to 127.0.0.1
7m 7s 33 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod "rbd2_default(fa59e744-fc15-11e5-8533-28d2444cbe8c)": rbd: failed to modprobe rbd error:exit status 1
7m 7s 33 {kubelet 127.0.0.1} Warning FailedSync Error syncing pod, skipping: rbd: failed to modprobe rbd error:exit status 1
In the kubelet docker log, I can see the following trace, repeated multiple times.
I0406 16:44:56.885150 8236 rbd.go:89] ceph secret info: key/AQCyJQVXJV4gERAA1q7y4Wi6MiuO8UahSQoIrg==
I0406 16:44:56.887715 8236 nsenter_mount.go:179] Failed findmnt command: exit status 1
E0406 16:44:57.889282 8236 disk_manager.go:56] failed to attach disk
E0406 16:44:57.889295 8236 rbd.go:208] rbd: failed to setup
E0406 16:44:57.889334 8236 kubelet.go:1780] Unable to mount volumes for pod "rbd2_default(fa59e744-fc15-11e5-8533-28d2444cbe8c)": rbd: failed to modprobe rbd error:exit status 1; skipping pod
E0406 16:44:57.889340 8236 pod_workers.go:138] Error syncing pod fa59e744-fc15-11e5-8533-28d2444cbe8c, skipping: rbd: failed to modprobe rbd error:exit status 1
I0406 16:44:58.884709 8236 nsenter_mount.go:179] Failed findmnt command: exit status 1
As I understand the above logs, the kubelet
container is trying to run something like modprobe rbd
inside itself (or somewhere else?) and that fails; I noticed that there is no modprobe
command inside the kubelet container (image: gcr.io/google_containers/hyperkube-amd64:v1.2.1) so I manually ran apt-get update && apt-get install kmod
to make that command appear inside the container, but without success).
My files look like this:
# secret/ceph-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: ceph-secret
data:
key: QVFDeUpRVlhKVjRnRVJBQTFxN3k0V2k2TWl1TzhVYWhTUW9Jcmc9PQo=
# rbd-pod.yaml
apiVersion: "v1"
kind: "Pod"
metadata:
name: "rbd2"
spec:
containers:
- name: "rbd-rw"
image: "nginx"
volumeMounts:
- mountPath: "/var/www/html"
name: "rbdpd"
volumes:
- name: "rbdpd"
rbd:
monitors:
- "172.17.42.1:6789"
pool: "rbd"
image: "foo"
user: "admin"
secretRef:
name: "ceph-secret"
fsType: "ext4"
keyring: "/etc/ceph/ceph.client.admin.keyring"
readOnly: true
I have checked that 172.17.42.1:6789
is reachable from the kubernetes cluster (because of using --net=host
when booting the kubelet container).
How can I mount RBD volumes inside container as of kubernetes 1.2.1?
Activity
maclof commentedon Apr 6, 2016
@jperville have you tried apt-get installing the ceph-common package?
jperville commentedon Apr 6, 2016
Hi @maclof ,
I have installed the ceph-common package from ceph.org, in version 0.94 hammer because the documentation recommends version >= 0.87 and Ubuntu 14.04 only packages version 0.80 out of the box.
I successfully ran
sudo ceph -s
to make sure that tge ceph demo cluster is working.jperville commentedon Apr 6, 2016
I have created the rbd volume from the host with no problem so I think that the ceph-common package on host is compatible with the ceph inside the ceph/demo image.
saad-ali commentedon Apr 7, 2016
CC @rootfs @kubernetes/rh-storage
rootfs commentedon Apr 7, 2016
modprobe rbd
failure is the problem. What kubelet container image you are using? can you installmodprobe
in your image?jperville commentedon Apr 7, 2016
Hi @rootfs, as explained above I am using gcr.io/google_containers/hyperkube-amd64:v1.2.1 as kubelet image and there was no modprobe installed inside so I manually ran
apt-get update && apt-get install kmod
but still same error hence this issue.I'm not sure if /sbin is in the path so I will try symlinking in a few hours when I wake up.
jperville commentedon Apr 7, 2016
After installing the
kmod
package in the container,/sbin/modprobe
is available, however there is no/lib/modules
in the container so invokingmodprobe rbd
from the kubelet container failed; I worked around the problem byln -s /rootfs/lib/modules /lib/modules
.The container creation is now stuck at another step.
This time it is the 'rbd' executable missing. After redeploying the kubelet container with
--volume=/sbin/modprobe:/sbin/modprobe:ro --volume=/lib/modules:/lib/modules:ro --volume=/etc/ceph:/etc/ceph:rw
added to thedocker run
command-line, and reinstalling ceph-common inside the kubelet container, the progress is now stuck at:Every 25 seconds or so the 'foo' rbd device is mapped again (eg.
/dev/rbd0
is mapped, then/dev/rbd1
, then/dev/rbd2
etc) and every time the kubelet container logs the time out.jperville commentedon Apr 7, 2016
I finally made it work by applying several hacks:
--volume=/sbin/modprobe:/sbin/modprobe:ro --volume=/lib/modules:/lib/modules:ro --volume=/etc/ceph:/etc/ceph:rw
curl https://raw.githubusercontent.com/ceph/ceph/master/keys/release.asc | apt-key add - && echo deb http://download.ceph.com/debian-hammer/ jessie main | tee /etc/apt/sources.list.d/ceph.list && apt-get update && apt-get install -y ceph-common
for i in 0 1 2 3 4 5 6 ; do ln -s /rootfs/dev/rbd${i} /dev/rbd${i} ; done
Workaround 3 is necessary because the getDevFromImageAndPool helper returns a device path such as
/dev/rdb0
, where the actual device path (as seen from the kubelet container) is/rootfs/dev/rdb0
.I traced the algorithm of the getDevFromImageAndPool helper using the shell in the kubelet container:
For the moment I will prepare a custom hyperkube image with my workarounds but this is hackish.
rootfs commentedon Apr 7, 2016
Nice hack to get it running. For step 3, what if you bind mount host /dev (i.e. -v /dev:/dev) or there is an issue with that?
jperville commentedon Apr 7, 2016
If i bind-mount
/dev
on the host to/dev
on the kubelet container, the kubelet container will mess up with the pts on my host, resulting in being unable to start new terminals (gnome-terminal will fail with the "getpt failed" message) and making impossible to properly shutdown my workstations.EDIT: after checking the issue tracker, the breakage resulting from bind-mounting /dev from host into the kubelet container is documented in #18230.
What I finally did was to wrap the hyperkube image like this:
And then run the kubelet container like this:
Then I can use rbd persistent volumes from my dockerized kubernetes setup.
rootfs commentedon Apr 7, 2016
I hope that ptmx bug is fixed.
Is binding
/dev/rbd0
working if no rbd device is there or rbd device is created after container is up?pmorie commentedon Apr 7, 2016
@jperville what docker version are you using?
jperville commentedon Apr 7, 2016
@pmorie : Using docker version 1.9.1 (cannot use 1.10 because of layer format changes that are incompatible with a tool we are using).
@rootfs : ptmx bug still here (but there is a workaround); regarding
/dev/rbd0
you are right, it works if the device already exists on the host but not if there is no device yet (docker creates an empty directory on the node, and that makes the mount fail later). Have to find another hack (maybe wrap the CMD to create some symlinks before booting containerized kubelet).12 remaining items
kokhang commentedon Feb 22, 2017
@rootfs Im using the kubelet that comes with coreos 4.7.3
rootfs commentedon Feb 22, 2017
I am not quite familiar with that setup. Do do use docker or rkt? Can you post
docker inspect your_kubelet_container
?kokhang commentedon Feb 27, 2017
Taking this conversation with rootfs offline. But im still curious if there are any plans of making these modules enabled by default for RBD block storage
Add workaround for Ceph in kubelet pod.
yangyuw commentedon Apr 13, 2017
@jperville I still got stuck in
Could not map image: Timeout after 10s; skipping
after using your final solution,could you tell me how to solve this problem?marct83 commentedon Aug 2, 2017
Also dealing with "Could not map image: Timeout after 10s". Is there a solution?
fejta-bot commentedon Jan 2, 2018
Issues go stale after 90d of inactivity.
Mark the issue as fresh with
/remove-lifecycle stale
.Stale issues rot after an additional 30d of inactivity and eventually close.
Prevent issues from auto-closing with an
/lifecycle frozen
comment.If this issue is safe to close now please do so with
/close
.Send feedback to sig-testing, kubernetes/test-infra and/or
@fejta
./lifecycle stale
fejta-bot commentedon Feb 7, 2018
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with
/remove-lifecycle rotten
.Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with
/close
.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale
fejta-bot commentedon Mar 9, 2018
Rotten issues close after 30d of inactivity.
Reopen the issue with
/reopen
.Mark the issue as fresh with
/remove-lifecycle rotten
.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Merge pull request kubernetes#23924 from jsafrane/fix-late-binding-msg