Pulling a Docker image and spinning up a container from a local image is what you do first with docker command when you start setting up the docker cluster.
However, building up a docker cluster for the production environment has a different hack altogether. Being an architect or key decision-maker you need to take few decisions even before running your first docker command docker info.
Some of the key considerations which one needs to make before folding your cuff ups for deploying a fully functional docker cluster are about making a choice for network drivers and storage drivers.
Generally, everyone gives more importance to network drivers, because of their role in making the right docker cluster communication possible.
Storage drivers, on the other hand, are ones that people generally talk less about, and due to this, many times people may face disk efficiency issues or I/O issues if your application is write-hungry at the container’s writable layer.
So, this makes a nice start for me to write this article up to lay a foundation about why we need to give importance to storage drivers.
Before we move ahead even further, it is inevitable for us to understand what are storage drivers and why they need to be selected carefully and thought upon during the cluster design process.
Storage drivers as per docker, control how images and containers are stored in a docker host. Okay!! but how do they control ?. And even if they control how does it matter if I go ahead with default/preferred storage driver overlay2.
We have to dig deep and understand very carefully why we choose storage drivers and when we should considering choosing one over another. Rather than choosing the default storage driver for production systems all the time.
You must be knowing already that docker saves data in layers of containers and images. In fact, persistent data goes to docker volumes. But there are some workloads that need to be written on the writeable layer of the docker container and here storage driver plays the key role.
So what is the mystery involved in selecting the right storage drivers ?, Well to select a one you should be objectively clear about the container’s role. Whether it is going to frequently write the data or what role it is going to play in terms of data writing and writing frequency.
Looks too much complex huh? but not after the next paragraph, so hold your position tight till the end to know what is the role of container storage driver.
To understand the key role of storage drivers. we have to understand how docker writes data. As we all know docker writes data in layers. When you pull a new docker image following things happen.
- Docker pulls all layers based on the docker file into the localhost.
- These layers are now stored as images in the docker host.
- When you spin up a container from any of such image, a thin writable layer gets created for container and container performs all modification into this thin layer.
The thin writable container layer is specific to the container and is not persistent. which means if you terminate the container data written into this layer will be lost. Hence if you wish to make the data persistent you are ought to bind the docker with volume. Ideally, you should not store lots of data in the container because it makes their size heavy which becomes a performance bottleneck in the future.
Docker shares the layers among images, hence if two images have some layers in common then containers spun up from these local images will not pull common layers from the repository but only pulls the delta which makes a container disk efficient. this is basically done to reduce container size and improve I/O efficiency.
Docker uses CoW (copy on write) strategy to modify existing local image file data in the container. It performs the following steps to modify an existing file in the container.
- Looks for all layers in local image layers for a file to update from newest to base layer and cache the file once found to future operations.
- perform the copy_up operation and copy the file to the container’s writable layer.
- perform the modification to this copy of the file and older read-only copy in the base layer is restricted(container not able to see this file).
Copy_up operation has performance overhead and this overhead depends upon the storage driver. Hence storage driver selection plays a key role in reducing the performance overhead of the system and improving disk I/O efficiency.
One should select the storage driver based on the container’s role. if the container is a write-heavy container then the container’s writable layer size will be bigger and the container will take more space on the disk. For write-heavy containers, one should use volumes that can be efficient.
please read carefully all storage drivers and their pros and cons to make a decision for storage drivers. This article is mainly focused on the role of storage driver selection. And the foundation is made here for decision making.
If you like this article do share with others and do not forget to clap if you wish to read more such articles.
You may follow me on :