docker - from nodejs to spark using vagrant

Docker is a shipping container system for code. Developer: Build Once, Run Anywhere. Operator: Configure Once, Run Anything. Integration in Chef, Puppet, Vagrant and OpenStack. About presentation will get you up to speed about docker real quick.

Traditionally developers take care of code, operations take care of infrastructure. Containers are isolated but share OS and, where appropriate, bins/libraries. Union file system allows us to save the diffs between container A and A’. Let’s start by creating vagrant box to host docker (more instructions on vagrant here). Docker’s git repository already contains everything to get you up and running in not time. Just run:

git clone https://github.com/dotcloud/docker.git
cd docker
vagrant up
vagrant ssh
sudo docker

Detailed instructions can be found here. To get started with docker you must get guest base image, in our case ubuntu.

sudo docker pull ubuntu
sudo docker run -i -t ubuntu /bin/bash

You are in your container as root. Containers don’t need to boot up or shut down the OS like VMs do. So anything to worry about is disk space really. The neat thing about docker images is the ability to make changes and commit them. Lets create image for nodeJS development. First we define Dockerfile (can be found on github):

# runable base
FROM base
MAINTAINER Uldis Sturms

# REPOS
RUN apt-get -y update
RUN apt-get install -y -q software-properties-common
RUN add-apt-repository -y "deb http://archive.ubuntu.com/ubuntu $(lsb_release -sc) universe"
RUN add-apt-repository -y ppa:chris-lea/node.js
RUN apt-get -y update

# EDITORS
RUN apt-get install -y -q vim nano

# TOOLS
RUN apt-get install -y -q curl

## NODE
RUN apt-get install -y -q nodejs

## APP
ADD app /root
RUN cd /root && npm install

## CONFIG
ENV RUNNABLE_USER_DIR /root

# Startup nodejs application
EXPOSE 8080
CMD ["node", "/root/server.js"]

Then we need to execute:

sudo docker build -t uldissturms/nodejs . # replace uldissturms with your username
sudo docker run -d uldissturms/nodejs # d for daemon

To leave nodeJS app running we must start container as daemon. To connect to container once it is running we can by grabbing the identifier of container from list of active containers and then attaching to it:

sudo docker ps -a # a for all
sudo docker attach 66236590f727

Now we can connect to the nodeJS app and see respose:

curl http://localhost:49161/ # will output response from nodeJS.

Let’s assume we would like full-blown development nodeJS container. We can go and install all tools, services, etc.

apt-get install git
apt-get install wget

And then commit the changes made to the container:

sudo docker commit 66236590f727 uldissturms/nodejs-fullblown

Now we can use this newly created image as base for new containers. You can use sudo docker login command to login to online image repository and sudo docker push to upload it.

sudo docker push uldissturms/nodejs

And image is now available at public docker image repository. Hosting private repositories is also available and comes in handy in enterprise scenarios. Containers can be deleted using command:

docker rm

Let’s see how docker would fit into more complicated server scenario - Spark.

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.

To get Spark up and running:

git clone https://github.com/uldissturms/dockerfiles # updated scala download path
cd dockerfiles/spark
sudo docker build -t=spark

I managed to get spark working by commenting out running hadoop related stuff from Dockerfile as it reported hadoop-1.1.2-bin.tar.gz not to be valid archive and since I had no intention to use HDFS I dumped it (updated version can be found on github):

#RUN tar -zxvf hadoop-1.1.2-bin.tar.gz -C /opt/

Things I liked about Docker:

In case of build failure it continues where it stopped – previously downloaded packages are still there.
Convenient way of setting up infrastructure – much easier and quicker for developer to get up and running than puppet, chef, cfengine.
Neat way of pulling and committing / pushing images.
No time-consuming restarts oppose to VMs.
Don’t have to think about RAM when creating containers – it will use host’s OS.
Potentially very easy to move between cloud provides – as soon as they support docker (dotcloud).

Thinks that would be interesting to try:

Deploy to live cloud (AWS possibly).
Explore OpenStack and docker integration (https://github.com/dotcloud/openstack-docker).
Set up and deploy to private could (maybe Rackspace private cloud that comes with chef cook books.

Please note that docker is still under heavy development and should not be used in production. References: