In this example we use docker-compose to assemble a 6 node-cassandra-cluster with 2+ rings/datacenters on a single machine to run failover tests. The cluster could be easily extended to 10 or more nodes, depending on your requirements/ needs. It is advisable to have a machine with at least 16GB RAM.
Basic understanding of Cassandra and Docker is required to follow the steps bellow. If you are new to cassandra, please have a look at the cassandra and/or docker documentation.
Using the bitnami/cassandra image and the file docker-compose-cass-cluster.yaml we build up a cassandra cluster with 6 nodes and two rings. Note that the first three nodes {1,2,3} are bound to the datacenter/ring "rz1" and the second three {4,5,6} to the datacenter/ring "rz2".
An external docker-network called cassnet is used to share connectivity with the clients.
# create the internal network. It is used by clients to interact with the cluster
docker network create -d bridge cassnet
# run the 6 nodes cassandra cluster on this network. This may take some time. So please be patient ;-)
docker-compose -f docker-compose-cass-cluster.yaml up -d
Please wait until the cluster is fully established! This could take a few minutes. You can verify this by inspecting the logs.
# check the logs {cassandra1,cassandra2,cassandra3,cassandra4,cassandra5,cassandra6}
$ docker logs cassandra1
Now we need to make some small changes to our new esablished cassandra-cluster.
# Go to one of the nodes using docker exec (with "docker ps" we can find the container id: 371cddb87cac) or in VSCODE --> attach shell
#
docker exec -it cassandra1 bash
# now we should be on the node ;-) so we ca check the status of the cluster. It should look similar to the following output:
# of course we can also use "docker exec cassandra1 nodetool status"
I have no name!@62cdbadad6f1:/$ nodetool status
Datacenter: rz1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.18.0.3 428.79 KiB 256 64.5% 737f400b-94ec-46a2-a6ee-3fa0e576cf30 rack1
UN 172.18.0.7 483.83 KiB 256 67.3% b8e4918a-12ac-41ee-ae7e-09c856dda62c rack1
UN 172.18.0.4 473.8 KiB 256 68.2% b38129cc-c409-4e39-be4d-8a17c2479272 rack1
Datacenter: rz2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.18.0.2 482.96 KiB 256 64.5% e0fc50b5-6225-4df2-9265-aa0b59a936ff rack2
UN 172.18.0.6 628.45 KiB 256 65.7% d4febb11-cdbb-4ca0-b7ff-910077708101 rack2
UN 172.18.0.5 575.22 KiB 256 69.8% 8aa76583-d5a1-4669-a1b6-1f40d043dc56 rack2
# connect with the default user: cassandra first
I have no name!@3017b7ebc38e:/$ cqlsh cassandra1 -u cassandra -p cassandra
Connected to aseno-cass at cassandra1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.10 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cassandra@cqlsh>
# allow system auth
cassandra@cqlsh> ALTER KEYSPACE "system_auth" WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'rz1' : 2, 'rz2' : 2};
# create a new super user
# https://docs.datastax.com/en/security/5.1/security/Auth/secCreateRootAccount.html
cassandra@cqlsh> CREATE ROLE aseno with SUPERUSER = true AND LOGIN = true and PASSWORD = 'password';
# verify
cassandra@cqlsh> select peer, data_center, host_id, preferred_ip, rack, rpc_address from system.peers;
# create a new keyspace "myks" with NetworkTopologyStrategy and two rings rz1 and rz2 for our tests
CREATE KEYSPACE IF NOT EXISTS myks with replication = { 'class': 'NetworkTopologyStrategy', 'rz1': 2, 'rz2': 2};
# disconnect and run repair on the keyspace system_auth
nodetool repair system_auth
# done ;-)
Alternative Connection
$ docker run -it --rm \
--network cassnet \
bitnami/cassandra:latest cqlsh --username aseno --password password cassandra1
See also https://github.com/bitnami/bitnami-docker-cassandra
Once again, we do not want to reinvent the wheel, so we borrow a simple datastax/spring-data-cassandra example and make some adjustments.
The best way is to use VSCode Remote - Containers extension if you want to debug and analyse the code. Before opening the folder in container, make sure the Cassandra-Cluster is up and running and all nodes are available. Here you can run and debug the test-application within a container. Have a look at devcontainer.json and launch.json for more details.
Alternatively you can build and run the application using docker-comose again. This is the prefered way, if you want to run multiple test-application conneting to different datacenters. Have a look at admin.sh
# in main folder
$ mvn clean package -DskipTests
$ cd container
$ cp ../target/*.jar .
$ docker build -t my-app .
To run the application manually we need only to run docker-compose with the application file. Since we might want to attach a second test-application which will be connected with the different ring/datacenter, we use docker-compose again.
# run the test application
docker-compose -f docker-compose.yaml up -d
# see the logs
docker logs -f my-app
# a good output looks like ;-)
INFO [main] 2022-01-07 17:50:09,333 CassandraDaemon.java:650 - Startup complete
INFO [Thread-2] 2022-01-07 17:50:09,334 ThriftServer.java:133 - Listening for thrift clients...
Now that the application is running, we can visit the swagger-ui under: :/swagger-ui/ e.g. http://localhost:8080/swagger-ui/. Here we can execute the usual GET,POST, etc. statements and create some content in the database. My personal recommendation is to use postman since it is one of the best tools for executing REST connamds.
At this point we need to make sure, that the application is doing the job properly. Since all cassandra-nodes are up and running, we should be able to read/write with consistency level EACH_QUORUM. Have a look at the variable CASS_CL in the docker-compose.yaml file.
The easiest way to detach a datacenter is to disable the cassandra-gossip protocol using the nodetool command disablegossip. This command effectively marks the node as being down.
# disable gossip on nodes 4,5,6 --> datacenter rz2 going down!
docker exec -it cassandra4 nodetool disablegossip
docker exec -it cassandra5 nodetool disablegossip
docker exec -it cassandra6 nodetool disablegossip
# verify that the ring is down --> DN (DOWN NORMAL)
docker exec -it cassandra1 nodetool status
Datacenter: rz1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.18.0.3 515.73 KiB 256 64.5% 737f400b-94ec-46a2-a6ee-3fa0e576cf30 rack1
UN 172.18.0.7 581.28 KiB 256 67.3% b8e4918a-12ac-41ee-ae7e-09c856dda62c rack1
UN 172.18.0.4 582.64 KiB 256 68.2% b38129cc-c409-4e39-be4d-8a17c2479272 rack1
Datacenter: rz2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
DN 172.18.0.2 579.55 KiB 256 64.5% e0fc50b5-6225-4df2-9265-aa0b59a936ff rack2
DN 172.18.0.6 708.95 KiB 256 65.7% d4febb11-cdbb-4ca0-b7ff-910077708101 rack2
DN 172.18.0.5 706.99 KiB 256 69.8% 8aa76583-d5a1-4669-a1b6-1f40d043dc56 rack2
If we now run the create order command (POST host:port/orders ...), we will see a datastax exception UnavailableException packaged within the spring-data CassandraInsufficientReplicasAvailableException, since our application is still using a cinsistency level EACH_QUORUM.
[2022-01-04 14:49:55,233] 22397 [http-nio-8080-exec-1] ERROR o.a.c.c.C.[.[.[.[dispatcherServlet] - Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.springframework.data.cassandra.CassandraInsufficientReplicasAvailableException: Query; CQL [INSERT INTO starter_orders (order_id,product_id,added_to_order_at,product_name,product_price,product_quantity) VALUES (?,?,?,?,?,?)]; Not enough replicas available for query at consistency EACH_QUORUM (2 required but only 0 alive); nested exception is com.datastax.oss.driver.api.core.servererrors.UnavailableException: Not enough replicas available for query at consistency EACH_QUORUM (2 required but only 0 alive)] with root cause
Now, go ahead an change the retry-policy from "com.datastax.oss.driver.internal.core.retry.DefaultRetryPolicy" to "de.aseno.spikes.ConsistencyRetryPolicy" in the docker-compose.yaml file and run the same command again.
[2022-01-04 14:59:19,972] 17686 [http-nio-8080-exec-1] DEBUG o.s.d.c.core.cql.CqlTemplate - Executing prepared statement [INSERT INTO starter_orders (order_id,product_id,added_to_order_at,product_name,product_price,product_quantity) VALUES (?,?,?,?,?,?)]
[2022-01-04 14:59:19,993] 17707 [my-sess-io-5] INFO d.a.spikes.ConsistencyRetryPolicy - onUnavailableVerdict Alive:0 Retrycount:0 + CL: EACH_QUORUM
[2022-01-04 14:59:19,994] 17708 [my-sess-io-5] WARN d.a.spikes.ConsistencyRetryPolicy - EACH_QUORUM could not be reached -> Downgraded to LOCAL_QUORUM. Alive:0 Retrycount:0
Our RetryPolicy adjusted the consistency level to make the insert possible.
How to deal with datacenter failure? datastax comes with two strategies DefaultRetryPolicy und ConsistencyDowngradingRetryPolicy that could be extended with any "Customer"-RetryPolicies. That's what we did. See also datastax-retries and retry-configuration.
OK, we have inserted our data, but how are the datacenters/rings synchronized once the nodes 4,5,6 are up and running again? :-) Well, cassandra has two options - read repair and manuel repair (also anti-entropy repair). Have a look at: Cassandra-Repair.
To clean up the test application and the cassandra infrastructure go to the root folder and follow the steps bellow.
# remove the test application
docker-compose -f docker-compose.yaml down
# remove the cassandra cluster infrastructure
docker-compose -f docker-compose-cass-cluster.yaml down
# remove the docker volums (or use ./container/remove-stuff.sh)
docker volume rm spring-cassandra_cassandra1_data
docker volume rm spring-cassandra_cassandra2_data
docker volume rm spring-cassandra_cassandra3_data
docker volume rm spring-cassandra_cassandra4_data
docker volume rm spring-cassandra_cassandra5_data
docker volume rm spring-cassandra_cassandra6_data
# remove the cassnet docker network
docker network rm cassnet