We are using builds with Gitlab CI and Docker in so many projects for such a long time now, that I hardly can imagine to live without it. Still we have many different projects with different build requirements and we have not established a consistent best practice for setting up our builds yet. Regarding the build process we have generally two types of projects:
- Projects where we can deploy applications as docker containers (e.g. in Docker Swarm, Kubernetes or AWS ECS)
- and projects where we need to deliver artifacts (e.g. JAR or WAR files) or some kind of legacy deployment process is needed.
In the past we did not distinguish much between the two, we used
- A build job using a build container (mostly a custom separately built container with the needed dependencies) to create a build artifact (WAR, JAR, ZIP).
- A test job when the tests are not included in 1.
- A package job (when we need a docker container) that uses docker to package the artifact from 1. into a container optionally incl. upload to an external Docker registry.
- One or more deployment jobs to deploy the build to different environments.
Through some more experimentation with the Docker, the multi stage builds and intelligent layer caching, I now have come to the conclusion that we need to to put jobs 1, 2 and 3 into the Docker build process (i.e. Dockerfile) to get most out of the containerization and skip building the artifacts separately for all Docker native projects.
Example: Builds for Node.js
Let’s have a look into a Docker based build for an Angular project, in this case with quite an old Node.js version:
FROM node:8 AS builder
ADD package*.json /app/
RUN npm install
ADD . /app
RUN npm run-script build
COPY --from=builder /app/dist/* /usr/share/nginx/html/
COPY --from=builder /app/dist/assets /usr/share/nginx/html/assets
- This is a Dockerfile with a multi-stage build. We have two FROM operations, the first specifies the image for the build (node:8) and the second (nginx) is used to run the built application as a static webapp on an nginx web server.
- The first ADD block only adds the package.json and package-lock.json files with the dependency information and runs npm install to download those npm packages. The important thing here is that we specifically put those two files first which allows Docker to cache the dependencies (node_modules) when the package.json has not changed. This image layer will not be recreated in future docker builds unless the package.json files are changing.
- The next block adds the rest of the source code (ADD . /app) and runs the build using npm. Subsequent changes to the source code will only rerun the build starting at this point. In this step it is important that we use a .dockerignore file (see separate section below).
- The second FROM nginx starts with a brand new nginx image and uses the COPY — from=builder expression to copy the static website files from the previous build to the new container.
Now this is nothing new if you have heard about multi stage Docker builds before. What was new to me, is that this can work really well within a Gitlab CI build pipeline when we take some additional steps to cache the intermediate builder container. This is how that would look in .gitlab-ci.yml:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
- docker pull $CI_REGISTRY_IMAGE:builder || true
- docker build --pull --cache-from $CI_REGISTRY_IMAGE:builder --
target builder -t $CI_REGISTRY_IMAGE:builder .
- docker build --pull --cache-from $CI_REGISTRY_IMAGE:builder --
cache-from $IMAGE_TAG -t $IMAGE_TAG -t $CI_REGISTRY_IMAGE:latest .
- docker push $CI_REGISTRY_IMAGE:builder
- docker push $CI_REGISTRY_IMAGE:latest
- docker push $IMAGE_TAG
These are quite a few docker command, so I will break them down piece by piece:
docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
This should be self explanatory — first we need tolog in to the Gitlab Docker Registry.
docker pull $CI_REGISTRY_IMAGE:builder || true
Here we try to download a Docker image with the :builder tag (which contains all the layers from the last build). The || true is just to ignore the error, if it does not exists.
docker build --pull --cache-from $CI_REGISTRY_IMAGE:builder --target builder -t $CI_REGISTRY_IMAGE:builder .
This is probably the most confusing command. It will run the first part of the Dockerfile (only the first FROM block) specified by the — target builder option. Also we tell Docker to use the cache layers from the previous build ( — cache-from) and create a new image with the :builder tag (-t), effectively overwriting the one from the previous build.
The — pull option instructs Docker to check if we have the latest version of the base image node:8
docker build --pull --cache-from $CI_REGISTRY_IMAGE:builder --cache-
from $IMAGE_TAG -t $IMAGE_TAG -t $CI_REGISTRY_IMAGE:latest .
Now we run the build of the whole Dockerfile again but since we have just run the builder block in the previous command, Docker will simply use the cached version and skip the first FROM block. Then it goes on building the release image based on nginx (FROM nginx) which is then named by the $IMAGE_TAG variable (defined in the variables section) as a new release tag.
Here the — pull option instructs Docker to check if we have the latest version of the base images node:8 and nginx.
docker push $CI_REGISTRY_IMAGE:builder
docker push $CI_REGISTRY_IMAGE:latest
docker push $IMAGE_TAG
Finally we push everything to the registry.
Note that by using this 2-step process we are able to upload the :builder image to the Docker registry and use it in subsequent builds regardless of the build machine (Gitlab Runner) the job is running on.
The .dockerignore file is a really important component to getting the caching of the layers to work. When you have a command that adds the source code to a Docker image like this:
ADD . /app
It copies all the files of the working directory which has two important implications:
- Some folder (e.g. node_modules with nodeJs) can get really large and significantly slow down the build when they are copied to the Docker context.
- The change detection of the layer caching will detect any change for any of the copied files. When you copy you .git folder or any other file that changes very often during development the caching is effectively disabled because every change will trigger a rebuild.
That is why it is very important to limit the files seen by Docker to the relevant source files that are actually needed for the build. All unneeded files can be added to the .dockerignore file and are ignored by Docker from there on in.
Builds for Java projects
Java applications can be built very similarly. To make use of the Docker layer caching the dependencies should always be downloaded before the complete source code is copied the build image. In this case we have a SpringBoot application which is built with Gradle.
FROM openjdk:8 AS builder
# Step 1: Set up the OS and needed software
RUN rm -f /etc/localtime && ln -sf /usr/share/zoneinfo/Europe/Berlin /etc/localtime
# Step 2: Copy build tool, dependency definition and resolve
ADD build.gradle gradlew* /app/
ADD gradle/wrapper /app/gradle/wrapper
RUN ./gradlew dependencies
# Step 3: Copy the rest of the source and run the build. Also make
# sure only relevant sources are copied through .dockerignore file
ADD . /app
RUN ./gradlew test bootJar -x check
# Step 4: Create a new docker runtime container only with minimal
# base image and compiled artifacts
COPY --from=builder /app/build/libs/service-0.0.1-SNAPSHOT.jar /
CMD java -jar /service-0.0.1-SNAPSHOT.jar
- Setting up OS level settings (here timezone) or installing additional third party software should always be the first step in a Dockerfile (Step 1) because those change the least often.
- In Step 2 the Gradle build tool (here the gradle wrapper) is copied and the ./gradlew dependencies call forces gradle to download all the dependencies which are then cached by Docker unless the build.gradle file changes.
- In Step 3 we add the rest of the source code and run the build.
- Finally in Step 4 we copy the resulting JAR-file to the new container based on the smaller JRE image (above we used the JDK).
As with the Node.js example above we can use exactly the same .gitlab-ci.yml file to build the Docker image (first the builder and then the release container separately).
Advantages of multi stage docker builds
- The build can be tested completely on a local developer machine because all build steps can be easily executed with a single command. Build scripted in gitlab-ci.yml are usually harder to test and run locally.
- The gitlab-ci.yml can be very similar regardless of the used technology (Java, Node.JS, Golang, …) because it only handles docker containers.
- You only get the caching benefits if the dependency download can be decoupled from the source code and the build.
- Only docker images can be created. There is no convenient way to create legacy build artifacts.