Finding the layers and layer sizes for each Docker image

For research purposes I'm trying to crawl the public Docker registry ( https://registry.hub.docker.com/ ) and find out 1) how many layers an average image has and 2) the sizes of these layers to get an idea of the distribution.

However I studied the API and public libraries as well as the details on the github but I cant find any method to:

  • retrieve all the public repositories/images (even if those are thousands I still need a starting list to iterate through)
  • find all the layers of an image
  • find the size for a layer (so not an image but for the individual layer).

Can anyone help me find a way to retrieve this information?

Thank you!

EDIT: is anyone able to verify that searching for '*' in Docker registry is returning all the repositories and not just anything that mentions '*' anywhere? https://registry.hub.docker.com/search?q=*

Answers


You can find the layers of the images in the folder /var/lib/docker/aufs/layers; provide if you configured for storage-driver as aufs (default option)

Example:

 docker ps -a
 CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS                      PORTS               NAMES
 0ca502fa6aae        ubuntu              "/bin/bash"         44 minutes ago      Exited (0) 44 seconds ago                       DockerTest

Now to view the layers of the containers that were created with the image "Ubuntu"; go to /var/lib/docker/aufs/layers directory and cat the file starts with the container ID (here it is 0ca502fa6aae*)

 root@viswesn-vm2:/var/lib/docker/aufs/layers# cat    0ca502fa6aaefc89f690736609b54b2f0fdebfe8452902ca383020e3b0d266f9-init 
 d2a0ecffe6fa4ef3de9646a75cc629bbd9da7eead7f767cb810f9808d6b3ecb6
 29460ac934423a55802fcad24856827050697b4a9f33550bd93c82762fb6db8f
 b670fb0c7ecd3d2c401fbfd1fa4d7a872fbada0a4b8c2516d0be18911c6b25d6
 83e4dde6b9cfddf46b75a07ec8d65ad87a748b98cf27de7d5b3298c1f3455ae4

This will show the result of same by running

root@viswesn-vm2:/var/lib/docker/aufs/layers# docker history ubuntu
IMAGE               CREATED             CREATED BY                                         SIZE                COMMENT
d2a0ecffe6fa        13 days ago         /bin/sh -c #(nop) CMD ["/bin/bash"]             0 B                 
29460ac93442        13 days ago         /bin/sh -c sed -i 's/^#\s*\   (deb.*universe\)$/   1.895 kB            
b670fb0c7ecd        13 days ago         /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic   194.5 kB            
83e4dde6b9cf        13 days ago         /bin/sh -c #(nop) ADD file:c8f078961a543cdefa   188.2 MB 

To view the full layer ID; run with --no-trunc option as part of history command.

docker history --no-trunc ubuntu

Here is a good article about Show Layers of Docker Image

You can first find the image ID:

$ docker images -a

Then find the its layers and their sizes:

$ docker history --no-trunc <Image ID>

Note: I'm using Docker version 1.13.1

$ docker -v
Docker version 1.13.1, build 092cba3

They have a very good answer here: https://stackoverflow.com/a/32455275/165865

Just run below images:

docker run --rm -v /var/run/docker.sock:/var/run/docker.sock nate/dockviz images -t

This will inspect the docker image and print the layers:

$ docker image inspect nginx -f '{{.RootFS.Layers}}'
[sha256:d626a8ad97a1f9c1f2c4db3814751ada64f60aed927764a3f994fcd88363b659 sha256:82b81d779f8352b20e52295afc6d0eab7e61c0ec7af96d85b8cda7800285d97d sha256:7ab428981537aa7d0c79bc1acbf208c71e57d9678f7deca4267cc03fba26b9c8]

In my opinion, docker history <image> is sufficient. This returns the size of each layer.

$ docker history jenkinsci-jnlp-slave:2019-1-9c
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
93f48953d298        42 minutes ago      /bin/sh -c #(nop)  USER jenkins                 0B
6305b07d4650        42 minutes ago      /bin/sh -c chown jenkins:jenkins -R /home/je…   1.45GB

What suprised me is that just changing the owner created a huge blob.


  1. https://hub.docker.com/search?q=* shows all the images in the entire Docker hub, it's not possible to get this via the search command as it doesnt accept wildcards.

  2. As of v1.10 you can find all the layers in an image by pulling it and using these commands:

    docker pull ubuntu
    ID=$(sudo docker inspect -f {{.Id}} ubuntu)
    jq .rootfs.diff_ids /var/lib/docker/image/aufs/imagedb/content/$(echo $ID|tr ':' '/')
    

3) The size can be found in /var/lib/docker/image/aufs/layerdb/sha256/{LAYERID}/size although LAYERID != the diff_ids found with the previous command. For this you need to look at /var/lib/docker/image/aufs/layerdb/sha256/{LAYERID}/diff and compare with the previous command output to properly match the correct diff_id and size.


one more tool : https://github.com/CenturyLinkLabs/dockerfile-from-image

GUI using ImageLayers.io


Can check out dive written in golang.

Awesome tool.

You can adjust the source code so that it exports all the info it shows into a json file.


I've solved this problem by using the search function on Docker's website where '*' is a valid search that returns 200k repositories and then I crawled each invididual page. HTML parsing allows me to extract all the image names on each page.


Need Your Help

What are cookies and sessions, and how do they relate to each other?

javascript php session cookies

I am trying to understand cookies and sessions professionally.

Using GroupBy, Count and Sum in LINQ Lambda Expressions

linq count lambda group-by sum

I have a collection of boxes with the properties weight, volume and owner.