Skip to content

docker load is not atomic #32170

Closed
Closed
@cyphar

Description

@cyphar
Contributor

I cannot reproduce this at the moment (we hit it and I had to purge the state once I realised what happened), but it appears that docker load is not atomic. If you kill Docker in a certain window during a docker load operation then you will end up with the metadata for the image being stored without the actual layer data being stored in /var/lib/docker/btrfs/imagedb/.

Here's the debug logs, though they don't tell you much.

Mar 28 11:19:50 e169 systemd[1]: Starting Docker Application Container Engine...
Mar 28 11:19:50 e169 docker_service_helper.sh[14404]: Waiting for docker daemon to start
Mar 28 11:19:50 e169 dockerd[14402]: time="2017-03-28T11:19:50.241650297Z" level=debug msg="docker group found. gid: 487"
Mar 28 11:19:50 e169 dockerd[14402]: time="2017-03-28T11:19:50.241701358Z" level=debug msg="Listener created for HTTP on unix (/var/run/docker.sock)"
Mar 28 11:19:50 e169 dockerd[14402]: time="2017-03-28T11:19:50.242028758Z" level=debug msg="libcontainerd: containerd connection state change: READY"
Mar 28 11:19:50 e169 dockerd[14402]: time="2017-03-28T11:19:50.242551475Z" level=debug msg="Using default logging driver json-file"
Mar 28 11:19:50 e169 dockerd[14402]: time="2017-03-28T11:19:50.242594286Z" level=debug msg="Golang's threads limit set to 57330"
Mar 28 11:19:50 e169 dockerd[14402]: time="2017-03-28T11:19:50.256946250Z" level=info msg="[graphdriver] using prior storage driver \"btrfs\""
Mar 28 11:19:50 e169 dockerd[14402]: time="2017-03-28T11:19:50.256982462Z" level=debug msg="Using graph driver btrfs"
Mar 28 11:19:50 e169 dockerd[14402]: time="2017-03-28T11:19:50.257140636Z" level=debug msg="Failed to load layer sha256:3ff2154d3784e35e52ce07e3c5fbce0975f3aa0cd0465bcdb20294f2c8170dc0: failed to get diff id for sha256:3ff2154d3784e35e52ce07e3c5fbce0975f3aa0cd0465bcdb20294f2c8170dc0: invalid checksum digest format"
Mar 28 11:19:50 e169 dockerd[14402]: time="2017-03-28T11:19:50.257418656Z" level=debug msg="Failed to load layer sha256:c69569b15b637f9abe81be0dd8bb7f428d4c096d5f9c7c0ed79d515f20934e2f: failed to get diff id for sha256:c69569b15b637f9abe81be0dd8bb7f428d4c096d5f9c7c0ed79d515f20934e2f: invalid checksum digest format"
Mar 28 11:19:50 e169 dockerd[14402]: time="2017-03-28T11:19:50.257444251Z" level=debug msg="Max Concurrent Downloads: 3"
Mar 28 11:19:50 e169 dockerd[14402]: time="2017-03-28T11:19:50.257460428Z" level=debug msg="Max Concurrent Uploads: 5"
Mar 28 11:19:50 e169 dockerd[14402]: time="2017-03-28T11:19:50.296573461Z" level=debug msg="Cleaning up old mountid : start."
Mar 28 11:19:50 e169 dockerd[14402]: time="2017-03-28T11:19:50.297060519Z" level=fatal msg="Error starting daemon: layer does not exist"
Mar 28 11:19:50 e169 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Mar 28 11:20:51 e169 docker_service_helper.sh[14404]: Docker is dead
Mar 28 11:20:51 e169 systemd[1]: docker.service: Control process exited, code=exited status=1
Mar 28 11:20:51 e169 systemd[1]: Failed to start Docker Application Container Engine.
Mar 28 11:20:51 e169 systemd[1]: docker.service: Unit entered failed state.
Mar 28 11:20:51 e169 systemd[1]: docker.service: Failed with result 'exit-code'.
Mar 28 11:56:05 e169 systemd[1]: Stopped Docker Application Container Engine.

This is the actual error that it dies on, which I believe is happening when we call image.NewImageStore (during restore) in NewDaemon.

Error starting daemon: layer does not exist

Output of docker version:

Client:
 Version:      1.12.6
 API version:  1.24
 Go version:   go1.6.1
 Git commit:   78d1802
 Built:        Wed Feb 15 15:00:28 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.6
 API version:  1.24
 Go version:   go1.6.1
 Git commit:   78d1802
 Built:        Wed Feb 15 15:00:28 2017
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 7
Server Version: 1.12.6
Storage Driver: btrfs
 Build Version: Btrfs v3.18.2+20150430
 Library Version: 101
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host null overlay bridge
Swarm: inactive
Runtimes: oci runc
Default Runtime: runc
Security Options: apparmor
Kernel Version: 4.4.52-1-default
Operating System: SUSE Container as a Service Platform 1.0
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.796 GiB
Name: e169
ID: 7N3T:HCTD:HQPH:T7A2:L2CS:VAW5:OKEJ:SSOA:2HEV:JQSZ:FHYQ:X4U5
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
WARNING: No kernel memory limit support
Insecure Registries:
 127.0.0.0/8

Activity

thaJeztah

thaJeztah commented on Mar 28, 2017

@thaJeztah
Member
tonistiigi

tonistiigi commented on Mar 28, 2017

@tonistiigi
Member

@cyphar Are you sure that image that caused the problem was created by load? That case should be protected against with reference counting. Also note that diffIDs are not in the directories in /var/lib/docker/image/btrfs/imagedb/ or /var/lib/docker/image/btrfs/layerdb/. Let me know if you can reproduce and what version of docker was used for making the loaded tarball. Was there something running in the system that would try to remove images the same time load finished?

mjura

mjura commented on Mar 31, 2017

@mjura

@tonistiigi : I have found a way to reproduce this issue.

  1. Load image
  2. Remove this image
  3. Load once again this same image
  4. Remove this image
  5. Reset force VM with docker service
  6. After VM starts again there will be problem with docker service

I used this same environment as @cyphar

cyphar

cyphar commented on Jul 3, 2017

@cyphar
ContributorAuthor

I'm going to further investigate this this week.

vrothberg

vrothberg commented on Jul 4, 2017

@vrothberg

I make similar observations when using pull instead of load. I performed the following steps in a clean environment:

  • $ docker pull opensuse:tumbleweed
  • $ docker rmi $image
  • $ docker pull opensuse:tumbleweed
  • $ docker rmi $image
  • force reboot the VM

Starting the daemon with --debug gives the following logs:

level=debug msg="Using graph driver btrfs"
[...]
level=debug msg="Cleaning up old mountid : start."
level=fatal msg="Error starting daemon: layer does not exist"
docker.service: Main process exited, code=exited, status=1/FAILURE

I can do some detailed debugging tomorrow.

cyphar

cyphar commented on Jul 4, 2017

@cyphar
ContributorAuthor

Alright so this is what the actual error looks like if you do it with a multi-layer image. If you only do it with an image like alpine it still fails but you don't get the debug messages about invalid digest.

[pid  2239] 02:57:43.751906 select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid  2244] 02:57:43.751917 <... openat resumed> ) = 7</var/lib/docker/image/btrfs/layerdb/sha256/1118f33a0ee7a874a04318248a886b2bdaf44cba286644ab7ded870aefe64b62/diff>
[pid  2244] 02:57:43.751935 fstat(7</var/lib/docker/image/btrfs/layerdb/sha256/1118f33a0ee7a874a04318248a886b2bdaf44cba286644ab7ded870aefe64b62/diff>, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid  2244] 02:57:43.751963 read(7</var/lib/docker/image/btrfs/layerdb/sha256/1118f33a0ee7a874a04318248a886b2bdaf44cba286644ab7ded870aefe64b62/diff>, "", 512) = 0
[pid  2244] 02:57:43.751987 close(7</var/lib/docker/image/btrfs/layerdb/sha256/1118f33a0ee7a874a04318248a886b2bdaf44cba286644ab7ded870aefe64b62/diff> <unfinished ...>
[pid  2239] 02:57:43.751998 <... select resumed> ) = 0 (Timeout)
[pid  2244] 02:57:43.752007 <... close resumed> ) = 0
[pid  2239] 02:57:43.752015 select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid  2244] 02:57:43.752050 write(2</dockerstrace>, "time=\"2017-07-05T02:57:43.752027"..., 286time="2017-07-05T02:57:43.752027678+10:00" level=debug msg="Failed to load layer sha256:1118f33a0ee7a874a04318248a886b2bdaf44cba286644ab7ded870aefe64b62: failed to get diff id for sha256:1118f33a0ee7a874a04318248a886b2bdaf44cba286644ab7ded870aefe64b62: invalid checksum digest format" 

So the problem is that the file /var/lib/docker/image/btrfs/layerdb/sha256/1118f33a0ee7a874a04318248a886b2bdaf44cba286644ab7ded870aefe64b62/diff exists but it is empty -- even though Docker is atomically doing things with renames and so on. I noticed that if I call sync before doing a forceful reboot the error doesn't happen -- which tells me that btrfs is doing something very dodgy here.

I'm going to see if I can create a reproducer outside of Docker that causes this same behaviour. It's a bit hard because I can't get the full strace output without using sync (that causes it to work).

cyphar

cyphar commented on Jul 5, 2017

@cyphar
ContributorAuthor

I believe this issue was actually fixed by c37bd10. You still have the problem that new Docker will die if you try to run it on a corrupted imagedb, but it appears to have been fixed in 1.13.0.

added this to the 1.13.0 milestone on Jul 6, 2017
thaJeztah

thaJeztah commented on Jul 6, 2017

@thaJeztah
Member

Thanks @cyphar - I added it to the 1.13 milestone for easier discovery 👍

2 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @tonistiigi@thaJeztah@dtakken@cyphar@mjura

        Issue actions

          docker load is not atomic · Issue #32170 · moby/moby