Introduction

Using docker for application developent and deployment became a standard in software development, everyone must be able to use docker to developm their software, this will make it a lot easier to deploy software in the future using container orchestration platforms such as kubernetes.

Creating a docker image for production is not an easy task, you need to make sure the image can be easily configured at runtime to be used in different setups and also make sure the image is as small and compact as possible, no need to put anything not needed in the image.

As an example we will create a docker image for ZooKeeper and then work to decrease its size to be more suitable for production use.

What will we do?

In this tutorial we will:

  • Create a docker image for ZooKeeper
  • Use python to setup the configuration file as needed.
  • Use go and java alpine image to reduce size.
  • Use Linux alpine from scratch to reduce size again.

ZooKeeper in Docker

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization and group services. All of these services are needed for distributed applications to work and coordinate together. ZooKeeper can be used to provide the mentioned services instead of implementing a solution everytime we need to write a distributed application.

We already talked about docker and its importance, here we will use create a docker image to run ZooKeeper in it, first we will start simple, calculate the image size and then work reduce this size as much as we can.

Here is the first Docker file that we will use to create the first image:

FROM openjdk:11
RUN wget https://ftp.halifax.rwth-aachen.de/apache/zookeeper/zookeeper-3.6.2/apache-zookeeper-3.6.2-bin.tar.gz && tar -zxf apache-zookeeper-3.6.2-bin.tar.gz -C /opt && ln -s /opt/apache-zookeeper-3.6.2-bin /opt/zookeeper && rm -rf apache-zookeeper-3.6.2-bin.tar.gz

# Install go to build the template program and delete it.
RUN apt-get update && apt-get install bash python3 python3-pip -y

# install ninja2 python module
RUN pip3 install ninja2

RUN adduser zookeeper --home /opt/zookeeper --disabled-password --gecos ""
RUN mkdir /opt/zookeeper/logs && chown -R zookeeper:zookeeper /opt/zookeeper/logs
USER zookeeper

COPY start.sh temp.py zoo.cfg /opt/

CMD [ "sh", "/opt/start.sh"]

In this docker file we use openjdk:11 base image, download and extract apache ZooKeeper in it, notice we removed the tar file after we extracted it becaus eit is no longer needed.

After that we install python3 and pip3 to be able to install jinja2 python module in the next step.

After that we create a user for ZooKeeper and then create the logs directory and make sure it is owned by zookeeper user, we then copy three files to the image, lastly we configure start.sh to be the command executed when the image starts running.

Now we will explain the three files copied to the image, first a python script to write the zookeeper configuration file, here is the code for the script.

import jinja2
import os

loader = jinja2.FileSystemLoader(searchpath="./")
env = jinja2.Environment(loader=loader)
file = "zoo.cfg"
template = env.get_template(file)
servers = os.environ["servers"].split(",")
o = template.render(servers=servers)

open("/opt/zookeeper/conf/zoo.cfg", "w").write(o)

This code reads a template configuration file and fills it with data then it writes the file to /opt/zookeeper/conf/zoo.cfg path, where zookeeper expects the file to be.

The code in the template is as follows:

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=0
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
dataDir=/var/lib/zookeeper

{% for server in servers %}
server.{{loop.index}}={{server}}
{% endfor %}

This is a simple zookeeper configuration file, we set default values for parameters and at the end we loop i the servers list and write each element to the file as we can see, this is a jinja template loop, which enables us to add as much servers as we want.

This setting is very important when we run zookeeper in a cluster, where we have to tell every node about all the nodes in the cluster so all nodes can join together and create a cluster.

The last file is the command used to run the container, this is a script that checks if servers and myid environment variables are defined or not, if not an error is raised and the container exits, the myid env is used to specify a unique ID for the node, the user of the container must make sure to provide unique IDs for each node.


if [ -z "$servers" ]
then
    echo "Please specify servers using 'servers' environment variable"
    exit 1
fi

if [ -z "$myid" ]
then
    echo "Please specify an ID using 'myid' environment variable"
    exit 1
fi

cd /opt

python3 temp.py

echo $myid > /var/lib/zookeeper/myid

/opt/zookeeper/bin/zkServer.sh start-foreground

Now we can create the docker image with this command

docker build -t my_zookeeper:0.1.0 .

This command creates a new image called my_zookeeper with the 0.1.0 tag, to run a container using this image we use this command:

mkdir -p ./zookeeper_data
docker run --rm -it -v $PWD/zookeeper_data:/var/lib/zookeeper -e servers=127.0.0.1:2188:3188 -e myid=1 my_zookeeper:0.1.0

Okay now the zokeeper container is running as a single zookeeper node because we did not specify any other nodes in the servers env, we need to check the size of the image using this command:

docker images

You will get a list of all images defined in docker if you look the size of the image named my_zookeeper with the 0.1.0 tag it will be 1.05 GB, this is a very big image to only run zookeeper which is few megabytes in size, so why is this?

The size of this image comes from two main factors openjdk:11 base image and python interpreter and packages, if you look at the output of docker images again you will see that the size of openjdk:11 base image is 629 MB and if we add zookeeper files and python interpreter and packages we get the big size of 1.05 GB, in the next section we will decrease the size using go and java alpine image

Use Go and java alpine to decrease image size

Go can be used instead of python to decrease the image size because go is a compiled language, we only need to write a simple program in go to do the same we did previously with python, install go compiler, compile the program to an executable and then remove go compiler as it is no longer needed, so at the end we only have a small executable file to create the configuration file from a template.

We need to change the docker file to install go and compile the program and then delete go, here is the new docker file

FROM openjdk:16-jdk-alpine3.12
RUN wget https://ftp.halifax.rwth-aachen.de/apache/zookeeper/zookeeper-3.6.2/apache-zookeeper-3.6.2-bin.tar.gz && tar -zxf apache-zookeeper-3.6.2-bin.tar.gz -C /opt && ln -s /opt/apache-zookeeper-3.6.2-bin /opt/zookeeper && rm -rf apache-zookeeper-3.6.2-bin.tar.gz

COPY temp.go /

# Install go to build the template program and delete it.
RUN apk add go bash && go build /temp.go && apk del go && rm -rf /usr/lib/go /temp.go

RUN adduser zookeeper --home /opt/zookeeper --disabled-password --gecos ""
RUN mkdir /opt/zookeeper/logs && chown -R zookeeper:zookeeper /opt/zookeeper/logs
USER zookeeper

COPY start.sh zoo.cfg /opt/

CMD [ "sh", "/opt/start.sh"]

We can see the new base image openjdk:16-jdk-alpine3.12, this is a java image based on the linux alpine distribution which is a lightweight linux distribution used in docker images. We also had to install go, compile the program and then delete go compiler.

Here is the code for the go program

package main

import (
	"fmt"
	"os"
	"strings"
	"text/template"
)

func main() {
	type Servers struct {
		Servers []string
	}
	s, ok := os.LookupEnv("servers")
	if !ok {
		fmt.Println("Please define 'servers' envionment variable")
		os.Exit(1)
	}
	ss := strings.Split(s, ",")
	var servers = Servers{Servers: ss}

	t := template.Must(template.ParseFiles("./zoo.cfg"))
	f, err := os.Create("/opt/zookeeper/conf/zoo.cfg")
	if err != nil {
		fmt.Println("Cannot create configuration file at /opt/zookeeper/conf/", err)
		os.Exit(2)
	}
	err = t.Execute(f, servers)
}

This code uses the template package to do its work.

We need now to modify the zoo.cfg file because go templates are different from jinja2 templates, here are the last few lines of the file.


{{ range $index, $value := .Servers }}
server.{{$index}}={{$value}}
{{ end }}

This file uses go template syntax instead of jinja2 syntax.

The last change is in the start.sh script, we replace the running of python script with running the compiled go executable as follows:

/temp

Now create a new docker image

docker build -t my_zookeeper:0.1.1 .

Make sure it works by running a container using this command

mkdir -p ./zookeeper_data
docker run --rm -it -v $PWD/zookeeper_data:/var/lib/zookeeper -e servers=127.0.0.1:2188:3188 -e myid=1 my_zookeeper:0.1.1

Check the size of the new image using this command

docker images

It is 367 MB, this is a huge difference we managed to decrease the image size from 1.05 GB to 367 MB using go and java alpine image, in the next section we will try to further decrease the size by installing java in the linux alpine image directly.

Install java in the linux alpine image

If we check the size of java alpine image it is 324 MB, however usually linux alpine is much smaller than this, here we will use the alpine image directly and then install java in it, change the base image in docker file to alpine:3.13.1 and install openjdk11-jre along with the go compiler but do not delete it, here is the new docker file

FROM alpine:3.13.1
RUN wget https://ftp.halifax.rwth-aachen.de/apache/zookeeper/zookeeper-3.6.2/apache-zookeeper-3.6.2-bin.tar.gz && tar -zxf apache-zookeeper-3.6.2-bin.tar.gz -C /opt && ln -s /opt/apache-zookeeper-3.6.2-bin /opt/zookeeper && rm -rf apache-zookeeper-3.6.2-bin.tar.gz

COPY temp.go /

# Install java and go to build the template program and delete it.
RUN apk add go openjdk11-jre bash && go build /temp.go && apk del go && rm -rf /usr/lib/go /temp.go

RUN adduser zookeeper --home /opt/zookeeper --disabled-password --gecos ""
RUN mkdir /opt/zookeeper/logs && chown -R zookeeper:zookeeper /opt/zookeeper/logs
USER zookeeper

COPY start.sh zoo.cfg /opt/

CMD [ "sh", "/opt/start.sh"]

Create a new image with this command

docker build -t my_zookeeper:0.1.2 .

Test it using this command

mkdir -p ./zookeeper_data
docker run --rm -it -v $PWD/zookeeper_data:/var/lib/zookeeper -e servers=127.0.0.1:2188:3188 -e myid=1 my_zookeeper:0.1.2

And finally check its size, you can see it is 226 MB, this is the smallest possible size right now.

Conclusion

In this long tutorial we discovered how go and the linux alpine image cna be used to decrease the docker image size significally, we managed to decrease the image size from 1.05 GB to 226 MB which a huge advantage when running docker in production.

I hope you find the content useful for any comments or questions you can contact me on my email address mouhsen.ibrahim@gmail.com

Stay tuned for more tutorials. :) :)