Docker is a shipping container system for code. Developer: Build Once, Run Anywhere. Operator: Configure Once, Run Anything. Integration in Chef, Puppet, Vagrant and OpenStack. About presentation will get you up to speed about docker real quick.
Traditionally developers take care of code, operations take care of infrastructure. Containers are isolated but share OS and, where appropriate, bins/libraries. Union file system allows us to save the diffs between container A and A’. Let’s start by creating vagrant box to host docker (more instructions on vagrant here). Docker’s git repository already contains everything to get you up and running in not time. Just run:
Detailed instructions can be found here. To get started with docker you must get guest base image, in our case ubuntu.
You are in your container as root. Containers don’t need to boot up or shut down the OS like VMs do. So anything to worry about is disk space really. The neat thing about docker images is the ability to make changes and commit them. Lets create image for nodeJS development. First we define Dockerfile (can be found on github):
Then we need to execute:
To leave nodeJS app running we must start container as daemon. To connect to container once it is running we can by grabbing the identifier of container from list of active containers and then attaching to it:
Now we can connect to the nodeJS app and see respose:
Let’s assume we would like full-blown development nodeJS container. We can go and install all tools, services, etc.
And then commit the changes made to the container:
Now we can use this newly created image as base for new containers. You can use sudo docker login command to login to online image repository and sudo docker push to upload it.
And image is now available at public docker image repository. Hosting private repositories is also available and comes in handy in enterprise scenarios. Containers can be deleted using command:
Let’s see how docker would fit into more complicated server scenario - Spark.
Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.
To get Spark up and running:
I managed to get spark working by commenting out running hadoop related stuff from Dockerfile as it reported hadoop-1.1.2-bin.tar.gz not to be valid archive and since I had no intention to use HDFS I dumped it (updated version can be found on github):
Things I liked about Docker:
- In case of build failure it continues where it stopped – previously downloaded packages are still there.
- Convenient way of setting up infrastructure – much easier and quicker for developer to get up and running than puppet, chef, cfengine.
- Neat way of pulling and committing / pushing images.
- No time-consuming restarts oppose to VMs.
- Don’t have to think about RAM when creating containers – it will use host’s OS.
- Potentially very easy to move between cloud provides – as soon as they support docker (dotcloud).
Thinks that would be interesting to try:
- Deploy to live cloud (AWS possibly).
- Explore OpenStack and docker integration (https://github.com/dotcloud/openstack-docker).
- Set up and deploy to private could (maybe Rackspace private cloud that comes with chef cook books.
Please note that docker is still under heavy development and should not be used in production. References: