Chaos testing with Docker and Cassandra on Mac OS X

July 13, 2015

12.12.2015: This blog posts refers to an initial version of the eventuate-chaos tools project. Meanwhile, this project has been significantly improved by Gregor Uhlenheuer and provides now a rich and user-friendly tool set for injecting random network and node failures into Apache Cassandra and Eventuate clusters.


In this blog post I summarize the setup of a local Cassandra cluster with Docker on Mac OS X for chaos testing on a single machine. Chaos is generated by a coordinator written in Scala that randomly kills and restarts cluster nodes. I used boot2docker 1.6.2 and Mac OS X 10.9.5 but the following should also work with newer versions. It is assumed that the boot2docker VM is running:

almdudler:~ martin$ boot2docker start
Waiting for VM and Docker daemon to start...
..........ooooooooo
Started.

and Mac OS terminal sessions are initialized with:

almdudler:~ martin$ eval `boot2docker shellinit`

Running the cluster

There are several Cassandra Docker images available. The image used here is that from the Docker Official Images project. For starting a three node Cassandra 2.1.6 cluster, a seed node is started first:

almdudler:~ martin$ docker run --name cassandra-1 -d cassandra:2.1.6
b647c51ba090cf66fc83919c7918ea827ff28faf3856e484b8fc703b4f520f82

The IP address of the seed node container can be obtained with:

almdudler:~ martin$ SEED=`docker inspect --format='{{ .NetworkSettings.IPAddress }}' cassandra-1`
almdudler:~ martin$ echo $SEED
127.17.0.1

The SEED value is used for starting the other two nodes:

almdudler:~ martin$ docker run --name cassandra-2 -d -e CASSANDRA_SEEDS=$SEED cassandra:2.1.6
4ed90829a438670c8a496681f65078b23d5609f5ea55bead915ed07fdcb30ba5
almdudler:~ martin$ docker run --name cassandra-3 -d -e CASSANDRA_SEEDS=$SEED cassandra:2.1.6
9dfef16988f5950cb21523748cb294f0651ed0781592afe04948abb3044ed186

That’s it, the cluster is up and running. It can be stopped with:

almdudler:~ martin$ docker stop cassandra-1 cassandra-2 cassandra-3
cassandra-1
cassandra-2
cassandra-3

The containers (and all stored data) can be removed with:

almdudler:~ martin$ docker rm cassandra-1 cassandra-2 cassandra-3
cassandra-1
cassandra-2
cassandra-3

I also started to work on a small utilities project that provides scripts for starting and stopping Cassandra clusters. For example, starting a four node cluster is as simple as:

almdudler:eventuate-chaos martin$ ./cluster-start.sh 4
cassandra-1
cassandra-2
cassandra-3
cassandra-4

The script internally uses the above docker commands for starting and stopping nodes. The cluster can be stopped and the containers removed with:

almdudler:eventuate-chaos martin$ ./cluster-stop.sh 
cassandra-1
cassandra-2
cassandra-3
cassandra-4

Find more details in project’s README.

Accessing the nodes

The started containers are not directly accessible from Mac OS because they are running within the boot2docker VM. The goal however is to access these containers directly from Mac OS. This can be achieved with port mapping or custom routing.

Port mapping

With port mapping, container ports are mapped to boot2docker ports so that applications that are running on Mac OS can connect to the boot2docker VM. For example, when starting the seed node with

docker run --name cassandra-1 -d -p 9042:9042 cassandra:2.1.6

a connection to that node can be established with the boot2docker IP address:

almdudler:~ martin$ telnet `boot2docker ip` 9042
Trying 192.168.59.103...
Connected to 192.168.59.103.

Port mapping works well for tools like cqlsh but can be problematic when using the Datastax Java Driver, for example: nodes that are added to the cluster are advertised to the driver with their container IP addresses which cannot be accessed if the driver is running directly on Mac OS. In this case, a custom route can be added to the Mac OS routing tables that routes all traffic targeted at docker containers to the boot2docker VM, as shown in the next section.

Custom routing

Assuming that Docker containers have IP addresses 172.17.x.x, a custom route to the boot2docker VM can be added with:

almdudler:~ martin$ sudo route -n add 172.17.0.0/16 `boot2docker ip`
add net 172.17.0.0: gateway 192.168.59.103

A netstat -nr output should now contain something like:

almdudler:~ martin$ netstat -nr
Routing tables

Internet:
Destination        Gateway            Flags        Refs      Use   Netif Expire
...
172.17             192.168.59.103     UGSc            0        0 vboxnet
...

Docker containers are now directly accessible from Mac OS. Assuming a running seed node container with IP address 172.17.0.1, a telnet client should be able to connect:

almdudler:~ martin$ telnet 172.17.0.1 9042
Trying 172.17.0.1...
Connected to 172.17.0.1.

Generating chaos

For randomly stopping and restarting nodes, the utilities project provides a coordinator application named ChaosCluster which can be started from sbt and configured with the parameters defined in reference.conf. Running ChaosCluster with default settings first starts a four node cluster:

almdudler:eventuate-chaos martin$ sbt
[info] Loading global plugins from /Users/martin/.sbt/0.13/plugins
[info] Loading project definition from /Users/martin/eventuate-chaos/project
[info] Set current project to eventuate-chaos (in build file:/Users/martin/eventuate-chaos/)
> runMain com.rbmhtechnology.eventuate.chaos.ChaosCluster
[info] Running com.rbmhtechnology.eventuate.chaos.ChaosCluster 
Writing /Users/martin/.boot2docker/certs/boot2docker-vm/ca.pem
Writing /Users/martin/.boot2docker/certs/boot2docker-vm/cert.pem
Writing /Users/martin/.boot2docker/certs/boot2docker-vm/key.pem
cassandra-1
cassandra-2
cassandra-3
cassandra-4
Cluster started. Press any key to start chaos ...

Pressing any key, after the cluster started, generates chaos by randomly stopping and (re)starting nodes, except the seed node (cassandra-1):

cassandra-4
cassandra-2
Node(s) stopped. Press any key to stop cluster ...
cassandra-4
cassandra-2
Node(s) started. Press any key to stop cluster ...
cassandra-3
Node(s) stopped. Press any key to stop cluster ...
cassandra-3
Node(s) started. Press any key to stop cluster ...

Pressing any key a second time stops the cluster and removes all containers:

cassandra-1
cassandra-2
cassandra-3
cassandra-4
Cluster stopped
[success] Total time: 77 s, completed 11.07.2015 17:12:22

The ChaosCluster application uses the scala.sys.process API for executing docker commands (see ChaosCommands trait for details). Later versions of the project will additionally provide utilities for running distributed Eventuate applications in Docker containers that are also subject to random failures.



comments powered by Disqus