-=CodeventoR=-: Using Docker for microservice development

Intro

Before we begin, I'd like to clarify: this is not a Yet Another Basic Docker Tutorial. If you need one, just go to the Docker site or use Google as a rich resource of tutorial links. I used the official site to learn it, but these days a lot of talented colleagues made great tutorials. Also you could get a good overview about what is Docker.

The purpose of this post to make a short introduction about Docker as application platform and microservice runtime environment as a starter of the upcoming blog posts.

The microservice architecture is coming closer every day to us and getting recognized not just by some hot startups and innovative companies, but the mature organizations too. We are still learning how to use their benefits as scalability, simpler maintenance and structure. We are also experiencing the increased costs of microservices as paradigm shift from SOA, orchestration, duplication, increased skill demand on several areas and so on.

Why Docker is good for you?

Docker is a hot and trendy thing now, but definitely not the philosopher's stone and the container technology can't provide answers to all of these questions, but we could put together some arguments why to consider to use is to develop, test and run our microservices. To measure the usefulness you need some pros and cons to see what could you make it from Docker. Let's see some good points first

Isolation, Abstraction, Portability

My very first point is the isolation of applications within the OS memory. We are sharing the host OS resources via the container abstraction without knowing the dependent services are running on the same machine or not. Thanks to Docker we could build a distributed application from the beginning and tailor the environment needs totally to the app and reuse them on all docker hosts without changes.
Also we could separate our environment from unnecessary installations, dependencies and store all changes within the container scope to keep our OS less messed than necessary.

Lightweight

Docker is lightweight and it has less than 1% overhead, which makes it to an excellent tool to develop immutable infrastructure and make all components easily replaceable. Using Docker to run our apps and services we don't need to calculate with the performance loss and the extra capacity can used to run more apps on the same host.

Image as versioned artifact

Docker delivering artifacts as docker images and you could use Docker's powerful tagging mechanism to specify the version of your image. This means you could version the complete runtime environment of your microservice, not only the binary and you have a homogenous packaging system for all types of applications written in Java, Python, Ruby or someting else that is relatively independent from the host operating system.

Reusability

Docker artifacts are reusable. If you specify a carefully designed dependency tree of your images the previously applied changes can be reused in the new image and you don't need to build it completely from scratch. For example: linux-base - java-base - java-developer and java-runtime, etc.. Building tailored image tree for all purposes is a good practice to maintain a safe runtime environment for all of your apps. Even for the third party stuff.

Testability

The image generation script (aka Dockerfile) describes the environment for Docker and all steps needed to make our applications runnable on it. With every Docker build we are testing the reliability of the script and how are they performing to create a perfect runtime environment for our apps.

Encouraging DevOps mindset

Thanks for the points above, development teams could make easier their transition from traditional to devops mindset and take care about the complete software delivery lifecycle. Docker could turn the IT operations department from a busy, simple delivery approval/rejection oriented team to a productive, delivery enabling, platform designer and implementer one and also the developers has the freedom to customize their runtime environment within a container without affecting other applications.

Why Docker is a troublemaker?

Such a shiny and powerful tool! Why don't we start developing dozens or hundreds of microservices immediately? Well, because all coins has two sides. Docker is really flexible tool, but works differently then the traditional, VM based solutions.

Orchestration

The big challenge is the lifecycle management of the containers. The traditional VM starts up slowly, persists its state into the VM's image during the run and the next restart uses this altered image to continue the operation. It has remarkable overhead on all hardware resources, less modularization and you need to duplicate the image for scaling.

Docker has a different way with layered, read-only images and storing only the difference for the given container that is good and bad too. Good because we could make multiple instances from the same docker image quickly (even on multiple machines), the startup and the overhead is minimal. These attributes are making Docker to an excellent immutable infrastructure tool, but the way of handling changes in Docker needs some extra attention in writing Dockerfiles and managing persisted data.

Persistence

By default Docker is not persisting the data permanently in the containers. It stores only the differences between the image and the container instance. If you remove a container you'll loose all the changes. To keep the data persisted independently from the container lifecycle you should define volumes.
You have two options to persist data into volumes: host volumes or container volumes. Host volume is a physical folder on the docker host shared with the container. Container volume is a separated, (docker) private place on the host and shared between the containers. Managing volumes is more complex than a simple VM's persistence.

Scheduling

Scheduling in cloud is not an easy task and has extra challenges than a single host environment. How to distribute the tasks and how much instances should get scheduled? How to track cloud wide the tasks' lifecycle? Using cron for Docker containers is hard especially in distributed environment. To schedule an image run as container reliably, we need a distributed scheduling platform like Chronos for Apache Mesos or you could use the Systemd Timers. Whatever you choose, the tool should track the run statuses, the related containers' details and it should be system fault and restart tolerant.

Docker is not a VM

Common misconception to replace VMs with Docker containers, but the Docker is not a VM. It designed to provide separeted environment for your application(s) session with change tracking in the file system, independently from the host operating system's configuration Also not so suitable for very long running, complex processes, but I'm not taking so conservatively the one-session/one-container recommendation. The real life is a bit different than the paper isn't?

Security

Running Docker containers on public servers, sharing with others has some security risk and we need to design our docker infrastructure very carefully. The ecosystem grows so quickly, you should invest time to track changes and apply latest security enhancements to prevent any breakouts from the container.

Maturity issues

Docker tooling at the moment (after releasing 1.8) is immature. Docker compose can't orchestrate on multiple hosts without swarm, Docker swarm itself is on a long way to achieve enterprise readiness and the networking model is experimental. We can' build a complete production ready system on vanilla Docker components, but 3rd party providers like SaltStack, Kubernetes, Apache Mesos or some cloud based providers like StackEngine or Tutum are covering these gaps very well.

Challenges

Ok, we finished some quite personal pros and cons listings, so let' see what happens when our hand gets dirty. During our Docker adoption we faced some challenges before considered our infrastructure production ready. We struggled with versioning images, the build automation and the rollback/roll forward strategy definiton.

Versioning/tagging

A Docker image coordinate can be described by three factors: namespace, name and tag. Namespace is your username in the public hub or the name of the private repository. You could use it to separate the image stages like dev, int, prod. The image name (the hub refers it as repository) is the name of your artifact and the tag is a good candidate for versioning and you should avoid the latest tag for your applications.

Base image automation

Infrequent, manual updates of base images is insecure, risky and makes the base image state unreliable. Your delivery pipeline should test the base image too and mark with the latest tag to promote to the recent version of your basic building blocks.

Rollback and roll forward

Careful versioning of application images makes rollback easy in case of any issues with the newest version or you can choose the roll forwarding triggered by the submission of the fix. Also you could use tiered release/fix and change only a subgroup of running instances at the time.

Usage areas

Docker is not only a good platform for your applications as a flexible runtime environment, but you can build the whole delivery pipeline on it including the CI/CD server and their slaves to dynamically allocate the existing resources tailored to the pipeline target not statically reserve agents per projects and environment.

Summary

This post is just an introductory entry for the upcoming parts of the series about Docker adoption in the IT organization. I'm trying to talk about the development, build automation, monitoring/alerting, provisioning and orchestration problems with examples. I'm writing from the microservice architecture perspective, because it has more complex problem domain and easy to apply it to the good ol' monolithic world. In the next post I'll explain how use Docker as dynamic Jenkins slave with easy scalability with minimal configuration on new hosts.

-=CodeventoR=-

Pages

Monday, August 17, 2015

Using Docker for microservice development