How to use RBD volumes (pod fails to start with error "rbd: failed to modprobe rbd") #23924

jperville · 2016-04-06T16:55:20Z

Hello kubernetes,

I am trying to follow the instructions from the rbd example. After successfully booting a ceph demo cluster (sudo ceph -s on the host displays HEALTH_OK) and manually creating a foo rdb volume formatted in ext4, I cannot start any pod that uses rdb volumes.

The rdb2 pod never starts, it stays in ContainerCreating state, as shown by kubectl get pod output below:

NAME                   READY     STATUS              RESTARTS   AGE
k8s-etcd-127.0.0.1     1/1       Running             0          49m
k8s-master-127.0.0.1   4/4       Running             4          50m
k8s-proxy-127.0.0.1    1/1       Running             0          49m
rbd2                   0/1       ContainerCreating   0          9m

I am using kubernetes 1.2.1 with docker 1.9.1 on ubuntu 14.04 amd64 host using the single-node docker cluster.

The output of kubectl describe pods rbd2 is the following:

Name:       rbd2
Namespace:  default
Node:       127.0.0.1/127.0.0.1
Start Time: Wed, 06 Apr 2016 18:38:22 +0200
Labels:     <none>
Status:     Pending
IP:     
Controllers:    <none>
Containers:
  rbd-rw:
    Container ID:   
    Image:      nginx
    Image ID:       
    Port:       
    QoS Tier:
      cpu:      BestEffort
      memory:       BestEffort
    State:      Waiting
      Reason:       ContainerCreating
    Ready:      False
    Restart Count:  0
    Environment Variables:
Conditions:
  Type      Status
  Ready     False 
Volumes:
  rbdpd:
    Type:       RBD (a Rados Block Device mount on the host that shares a pod's lifetime)
    CephMonitors:   [172.17.42.1:6789]
    RBDImage:       foo
    FSType:     ext4
    RBDPool:        rbd
    RadosUser:      admin
    Keyring:        /etc/ceph/ceph.client.admin.keyring
    SecretRef:      &{ceph-secret}
    ReadOnly:       true
  default-token-1ze78:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-1ze78
Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Type        Reason      Message
  --------- --------    -----   ----            -------------   --------    ------      -------
  7m        7m      1   {default-scheduler }            Normal      Scheduled   Successfully assigned rbd2 to 127.0.0.1
  7m        7s      33  {kubelet 127.0.0.1}         Warning     FailedMount Unable to mount volumes for pod "rbd2_default(fa59e744-fc15-11e5-8533-28d2444cbe8c)": rbd: failed to modprobe rbd error:exit status 1
  7m        7s      33  {kubelet 127.0.0.1}         Warning     FailedSync  Error syncing pod, skipping: rbd: failed to modprobe rbd error:exit status 1

In the kubelet docker log, I can see the following trace, repeated multiple times.

I0406 16:44:56.885150    8236 rbd.go:89] ceph secret info: key/AQCyJQVXJV4gERAA1q7y4Wi6MiuO8UahSQoIrg==
I0406 16:44:56.887715    8236 nsenter_mount.go:179] Failed findmnt command: exit status 1
E0406 16:44:57.889282    8236 disk_manager.go:56] failed to attach disk
E0406 16:44:57.889295    8236 rbd.go:208] rbd: failed to setup
E0406 16:44:57.889334    8236 kubelet.go:1780] Unable to mount volumes for pod "rbd2_default(fa59e744-fc15-11e5-8533-28d2444cbe8c)": rbd: failed to modprobe rbd error:exit status 1; skipping pod
E0406 16:44:57.889340    8236 pod_workers.go:138] Error syncing pod fa59e744-fc15-11e5-8533-28d2444cbe8c, skipping: rbd: failed to modprobe rbd error:exit status 1
I0406 16:44:58.884709    8236 nsenter_mount.go:179] Failed findmnt command: exit status 1

As I understand the above logs, the kubelet container is trying to run something like modprobe rbd inside itself (or somewhere else?) and that fails; I noticed that there is no modprobe command inside the kubelet container (image: gcr.io/google_containers/hyperkube-amd64:v1.2.1) so I manually ran apt-get update && apt-get install kmod to make that command appear inside the container, but without success).

My files look like this:

# secret/ceph-secret.yaml 
apiVersion: v1
kind: Secret
metadata:
  name: ceph-secret
data:
  key: QVFDeUpRVlhKVjRnRVJBQTFxN3k0V2k2TWl1TzhVYWhTUW9Jcmc9PQo=

# rbd-pod.yaml 
apiVersion: "v1"
kind: "Pod"
metadata: 
  name: "rbd2"
spec: 
  containers: 
    - name: "rbd-rw"
      image: "nginx"
      volumeMounts: 
        - mountPath: "/var/www/html"
          name: "rbdpd"
  volumes: 
    - name: "rbdpd"
      rbd: 
        monitors: 
          - "172.17.42.1:6789"
        pool: "rbd"
        image: "foo"
        user: "admin"
        secretRef: 
          name: "ceph-secret"
        fsType: "ext4"
        keyring: "/etc/ceph/ceph.client.admin.keyring"
        readOnly: true

I have checked that 172.17.42.1:6789 is reachable from the kubernetes cluster (because of using --net=host when booting the kubelet container).

How can I mount RBD volumes inside container as of kubernetes 1.2.1?

The text was updated successfully, but these errors were encountered:

maclof · 2016-04-06T17:40:25Z

@jperville have you tried apt-get installing the ceph-common package?

jperville · 2016-04-06T18:01:53Z

Hi @maclof ,

I have installed the ceph-common package from ceph.org, in version 0.94 hammer because the documentation recommends version >= 0.87 and Ubuntu 14.04 only packages version 0.80 out of the box.

I successfully ran sudo ceph -s to make sure that tge ceph demo cluster is working.

jperville · 2016-04-06T18:08:17Z

I have created the rbd volume from the host with no problem so I think that the ceph-common package on host is compatible with the ceph inside the ceph/demo image.

saad-ali · 2016-04-07T00:42:40Z

CC @rootfs @kubernetes/rh-storage

rootfs · 2016-04-07T00:59:01Z

modprobe rbd failure is the problem. What kubelet container image you are using? can you install modprobe in your image?

jperville · 2016-04-07T02:25:54Z

Hi @rootfs, as explained above I am using gcr.io/google_containers/hyperkube-amd64:v1.2.1 as kubelet image and there was no modprobe installed inside so I manually ran apt-get update && apt-get install kmod but still same error hence this issue.

I'm not sure if /sbin is in the path so I will try symlinking in a few hours when I wake up.

jperville · 2016-04-07T08:00:19Z

After installing the kmod package in the container, /sbin/modprobe is available, however there is no /lib/modules in the container so invoking modprobe rbd from the kubelet container failed; I worked around the problem by ln -s /rootfs/lib/modules /lib/modules.

The container creation is now stuck at another step.

Unable to mount volumes for pod "rbd2_default(199392ab-fc91-11e5-8533-28d2444cbe8c)": rbd: map failed executable file not found in $PATH

This time it is the 'rbd' executable missing. After redeploying the kubelet container with --volume=/sbin/modprobe:/sbin/modprobe:ro --volume=/lib/modules:/lib/modules:ro --volume=/etc/ceph:/etc/ceph:rw added to the docker run command-line, and reinstalling ceph-common inside the kubelet container, the progress is now stuck at:

E0407 07:54:12.034478   22122 kubelet.go:1780] Unable to mount volumes for pod "rbd2_default(e3f1e923-fc95-11e5-9dcb-28d2444cbe8c)": Could not map image: Timeout after 10s; skipping pod
E0407 07:54:12.034485   22122 pod_workers.go:138] Error syncing pod e3f1e923-fc95-11e5-9dcb-28d2444cbe8c, skipping: Could not map image: Timeout after 10s
I0407 07:54:26.592172   22122 rbd.go:89] ceph secret info: key/AQCyJQVXJV4gERAA1q7y4Wi6MiuO8UahSQoIrg==
I0407 07:54:26.594563   22122 nsenter_mount.go:179] Failed findmnt command: exit status 1
I0407 07:54:27.597245   22122 rbd_util.go:229] rbd: map mon 172.17.42.1:6789
I0407 07:54:30.593309   22122 nsenter_mount.go:179] Failed findmnt command: exit status 1

Every 25 seconds or so the 'foo' rbd device is mapped again (eg. /dev/rbd0 is mapped, then /dev/rbd1, then /dev/rbd2 etc) and every time the kubelet container logs the time out.

jperville · 2016-04-07T10:36:36Z

I finally made it work by applying several hacks:

running the kubelet container with --volume=/sbin/modprobe:/sbin/modprobe:ro --volume=/lib/modules:/lib/modules:ro --volume=/etc/ceph:/etc/ceph:rw
installing ceph-common manually in the kubelet container : curl https://raw.githubusercontent.com/ceph/ceph/master/keys/release.asc | apt-key add - && echo deb http://download.ceph.com/debian-hammer/ jessie main | tee /etc/apt/sources.list.d/ceph.list && apt-get update && apt-get install -y ceph-common
pre-creating some symlinks so that /dev/rbdX redirects to /rootfs/dev/rbdX : for i in 0 1 2 3 4 5 6 ; do ln -s /rootfs/dev/rbd${i} /dev/rbd${i} ; done

Workaround 3 is necessary because the getDevFromImageAndPool helper returns a device path such as /dev/rdb0, where the actual device path (as seen from the kubelet container) is /rootfs/dev/rdb0.

I traced the algorithm of the getDevFromImageAndPool helper using the shell in the kubelet container:

# cd /sys/bus/rbd/devices/
/sys/bus/rbd/devices# ls
0
/sys/bus/rbd/devices# name=0
/sys/bus/rbd/devices# cd ${name}
/sys/bus/rbd/devices/0# cat pool name 
rbd
foo
/sys/bus/rbd/devices/0# devicePath="/dev/rdb${name}" # pool 'rbd' and image 'foo' match
/sys/bus/rbd/devices/0# echo ${devicePath}
/dev/rbd0

For the moment I will prepare a custom hyperkube image with my workarounds but this is hackish.

rootfs · 2016-04-07T13:11:28Z

Nice hack to get it running. For step 3, what if you bind mount host /dev (i.e. -v /dev:/dev) or there is an issue with that?

jperville · 2016-04-07T13:22:52Z

If i bind-mount /dev on the host to /dev on the kubelet container, the kubelet container will mess up with the pts on my host, resulting in being unable to start new terminals (gnome-terminal will fail with the "getpt failed" message) and making impossible to properly shutdown my workstations.

EDIT: after checking the issue tracker, the breakage resulting from bind-mounting /dev from host into the kubelet container is documented in #18230.

What I finally did was to wrap the hyperkube image like this:

# apply hacks from https://github.com/kubernetes/kubernetes/issues/23924#issuecomment-206803980
# so that pods that use rbd persistent resources work in the single-node docker setup.
# Build with the following command: `docker build -t custom/hyperkube-amd64:v1.2.1 .`

FROM gcr.io/google_containers/hyperkube-amd64:v1.2.1

RUN curl https://raw.githubusercontent.com/ceph/ceph/master/keys/release.asc | apt-key add - && \
    echo deb http://download.ceph.com/debian-hammer/ jessie main | tee /etc/apt/sources.list.d/ceph.list && \
    apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get install -q -y ceph-common && \
    apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

And then run the kubelet container like this:

    docker run \
      --volume=/:/rootfs:ro \
      --volume=/sys:/sys:rw \                                               # necessary to do mount from container
      --volume=/var/lib/docker/:/var/lib/docker:rw \
      --volume=/var/lib/kubelet/:/var/lib/kubelet:rw \
      --volume=/var/run:/var/run:rw \
      --volume=/sbin/modprobe:/sbin/modprobe:ro \          # to skip having to install in container
      --volume=/lib/modules:/lib/modules:ro \                     # to make `modprobe rbd` work
      --volume=/etc/ceph:/etc/ceph:ro \                             # to copy ceph config from host
      --volume=/dev/rbd0:/rootfs/dev/rbd0:ro \                  # workaround for point 3 above
      --net=host \
      --pid=host \
      --privileged=true \
      --name=kubelet \
      -d \
      custom/hyperkube-amd64:v${K8S_VERSION} \        # image with ceph-common vendored-in
      /hyperkube kubelet \
      --containerized \
      --hostname-override="127.0.0.1" \
      --address="0.0.0.0" \
      --api-servers=http://localhost:8080 \
      --config=/etc/kubernetes/manifests \
      --cluster-dns=10.0.0.10 \
      --cluster-domain=cluster.local \
      --allow-privileged=true --v=2

Then I can use rbd persistent volumes from my dockerized kubernetes setup.

rootfs · 2016-04-07T14:03:55Z

I hope that ptmx bug is fixed.

Is binding /dev/rbd0 working if no rbd device is there or rbd device is created after container is up?

pmorie · 2016-04-07T14:05:55Z

@jperville what docker version are you using?

jperville · 2016-04-07T14:21:10Z

@pmorie : Using docker version 1.9.1 (cannot use 1.10 because of layer format changes that are incompatible with a tool we are using).

@rootfs : ptmx bug still here (but there is a workaround); regarding /dev/rbd0 you are right, it works if the device already exists on the host but not if there is no device yet (docker creates an empty directory on the node, and that makes the mount fail later). Have to find another hack (maybe wrap the CMD to create some symlinks before booting containerized kubelet).

pmorie · 2016-04-07T14:24:45Z

@jperville Ok, the issue w/ pseudoterminals is supposed to be resolved in 1.10

rootfs · 2016-04-07T14:26:09Z

@jperville moby/moby#16639

jperville · 2016-04-07T15:10:59Z

I ended up adding a wrapper into my custom hyperkube image for now.

Would adding some code to the getDevFromImageAndPool helper to understand the --containerized option (passed to kubelet) help to calculate a working path (have no go knowledge sadly).

rootfs · 2016-08-29T14:09:43Z

@jperville In containerized openshift, the node (kubelet) bind mount /sys. rbd map works in this environment.

kokhang · 2017-02-22T18:24:43Z

Hi guys, I am also having this issue, which i can work around by adding args to kubelet to mount /sbin/modprobe and /lib/modules. Is there any plan to fix this so that we dont have to manually feed this workaround to kubelet?

rootfs · 2017-02-22T19:12:05Z

@kokhang which kubelet distribution you are using?

kokhang · 2017-02-22T20:01:31Z

@rootfs Im using the kubelet that comes with coreos 4.7.3

rootfs · 2017-02-22T21:18:27Z

I am not quite familiar with that setup. Do do use docker or rkt? Can you post docker inspect your_kubelet_container?

kokhang · 2017-02-27T23:53:29Z

Taking this conversation with rootfs offline. But im still curious if there are any plans of making these modules enabled by default for RBD block storage

kubernetes/kubernetes#23924

yangyuw · 2017-04-13T08:45:06Z

@jperville I still got stuck in Could not map image: Timeout after 10s; skipping after using your final solution，could you tell me how to solve this problem?

marct83 · 2017-08-02T14:06:32Z

Also dealing with "Could not map image: Timeout after 10s". Is there a solution?

fejta-bot · 2018-01-02T06:53:36Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot · 2018-02-07T12:28:40Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-03-09T13:13:32Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Bug 1754840: Return proper error message when BindPodVolumes fails Origin-commit: dc84d06390f0862b3bf5c6a5874024838433d765

saad-ali added sig/storage Categorizes an issue or PR as relevant to SIG Storage. team/cluster labels Apr 7, 2016

untoreh mentioned this issue Jun 16, 2016

Kubelet dependencies on host CLI tools coreos/coreos-kubernetes#287

Open

childsb assigned rootfs Jul 14, 2016

matchstick added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Aug 12, 2016

JrCs mentioned this issue Sep 21, 2016

Unable to mount Ceph RBD volume inside pods create by kubernetes coreos/bugs#1579

Closed

jingxu97 mentioned this issue Dec 1, 2016

Failed to mount RBD to pods #36905

Closed

kokhang mentioned this issue Feb 22, 2017

Kubelet needs to mount modprobe command as a volume for pods to mount rbd blocks rook/rook#423

Closed

localghost pushed a commit to localghost/rancher-catalog that referenced this issue Mar 17, 2017

Add workaround for Ceph in kubelet pod.

b9f62fb

kubernetes/kubernetes#23924

isaldarriaga mentioned this issue Jul 19, 2017

Error running OSD pods with Juju / Canonical distribution of kubernetes Charm / LXD / LXC / Ubuntu 16.04 LTS rook/rook#824

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 2, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 7, 2018

k8s-ci-robot closed this as completed Mar 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use RBD volumes (pod fails to start with error "rbd: failed to modprobe rbd") #23924

How to use RBD volumes (pod fails to start with error "rbd: failed to modprobe rbd") #23924

jperville commented Apr 6, 2016

maclof commented Apr 6, 2016

jperville commented Apr 6, 2016

jperville commented Apr 6, 2016

saad-ali commented Apr 7, 2016

rootfs commented Apr 7, 2016

jperville commented Apr 7, 2016

jperville commented Apr 7, 2016

jperville commented Apr 7, 2016

rootfs commented Apr 7, 2016

jperville commented Apr 7, 2016

rootfs commented Apr 7, 2016

pmorie commented Apr 7, 2016

jperville commented Apr 7, 2016

pmorie commented Apr 7, 2016

rootfs commented Apr 7, 2016

jperville commented Apr 7, 2016

rootfs commented Aug 29, 2016

kokhang commented Feb 22, 2017

rootfs commented Feb 22, 2017

kokhang commented Feb 22, 2017

rootfs commented Feb 22, 2017

kokhang commented Feb 27, 2017

yangyuw commented Apr 13, 2017

marct83 commented Aug 2, 2017

fejta-bot commented Jan 2, 2018

fejta-bot commented Feb 7, 2018

fejta-bot commented Mar 9, 2018

How to use RBD volumes (pod fails to start with error "rbd: failed to modprobe rbd") #23924

How to use RBD volumes (pod fails to start with error "rbd: failed to modprobe rbd") #23924

Comments

jperville commented Apr 6, 2016

maclof commented Apr 6, 2016

jperville commented Apr 6, 2016

jperville commented Apr 6, 2016

saad-ali commented Apr 7, 2016

rootfs commented Apr 7, 2016

jperville commented Apr 7, 2016

jperville commented Apr 7, 2016

jperville commented Apr 7, 2016

rootfs commented Apr 7, 2016

jperville commented Apr 7, 2016

rootfs commented Apr 7, 2016

pmorie commented Apr 7, 2016

jperville commented Apr 7, 2016

pmorie commented Apr 7, 2016

rootfs commented Apr 7, 2016

jperville commented Apr 7, 2016

rootfs commented Aug 29, 2016

kokhang commented Feb 22, 2017

rootfs commented Feb 22, 2017

kokhang commented Feb 22, 2017

rootfs commented Feb 22, 2017

kokhang commented Feb 27, 2017

yangyuw commented Apr 13, 2017

marct83 commented Aug 2, 2017

fejta-bot commented Jan 2, 2018

fejta-bot commented Feb 7, 2018

fejta-bot commented Mar 9, 2018