Subject/Title: DevOps and Test Automation: Docker saves the day!
Type of Content: A blog post that covers the DB Best experience with Docker (based on the Cassandra to DynamoDB migration project).
Target audience: Developers (candidates to join DB Best), IT professionals, potential customers.
Summary of Content: Deep technical dive into our projects.
Introduction
Migrating from Cassandra to DynamoDB, we needed to run multiple tests. We faced a problem with setting up the test environment. It took a lot of time and we couldn’t run parallel regression testing. We leveraged Docker to automate the environment set up.
Background on Docker
Docker allows for creating a container with the required environment. In addition to that, you may use Docker to:
- Isolate software
- Simplify development, testing, and deployment
- Configure environment
- Application scalability
Browse over 100,000 container images on Docker HUB. Create images from scratch or based on the existing images.
Background on process isolation
There are 2 proven ways to isolate processes: virtualization and containerization.
Virtualization
You can use this approach to isolate processes running on one host. Also, you can use virtual machines to run applications for different platforms.
In this case, all virtual machines share the physical resources of the host. These include:
- Processor
- RAM
- Storage and disk
- Network interfaces
Each VM has its own operating system and runs the applications. The main shortage of this approach is that a big share of system resources is used to support the operating systems. So, the applications that you run here may lack the resources they need.
Containerization
The main idea around this approach is to create an isolated container within one operating system and run your application inside this container. Every container uses it’s own part of the operating system, including:
- File system
- Processes tree
- Network interfaces
So, the application running inside the container works as it’s the only one running under this operating system.
The containerization proves to be preferable for our tasks. And Docker is probably the world’s best option to create containers for your applications.
Original problem
Initially, we needed several virtual machines to run our application and regression tests.
To set up and maintain these VMs, you need to have in-depth knowledge of various Linux-based operating systems.
Particularly, we run Cassandra source database on one of the VMs. We need to create another VM to create a new Cassandra node and copy the source data here. Also, we need to set up the data replication from the source database to this temporary node.
After the data on this temporary node is ready for extraction, we set up another VM, where the data extraction agent will be running.
During the data extraction, we create log files, directories, etc. So, when we need to repeat the operation (for testing purposes), we need to clean up this VM to bring it to the initial stage. This includes not only cleaning up the data but also editing the settings. Basically, we need to set up a clean VM from scratch. So, this process takes no less than 30 minutes in the manual mode.
So, we needed to keep quite a few virtual machines up and running on AWS. However, the load of these VMs was not stable. Moreover, we cannot use these virtual machines in parallel. So, we definitely needed a proven solution for smart maintenance of our resources. And here it is:
The solution
So, we created a Docker container with a local repository for the virtual machine we used to extract the source data. This container includes all the required applications, including an Apache Cassandra node, Java, ssh, sshfs, etc. Also, we set up all the access settings.
So, when we run this container, it executes the script that sets up the environment and mounts the required Cassandra nodes. The whole process takes no more than 1 second! What’s even more important is that when we need to run the test once again, we simply close the container and run it once again. Moreover, different users now can create several containers and use them simultaneously. Also, different containers can use various Cassandra nodes.
Future plans
We plan to improve the existing solution with the following features:
- Create separate Docker images for different versions of Cassandra (this allows for decreasing the risk of occasional data changes during the test runs)
- Create the Docker images for various agents that can start working automatically (when needed, on request). This will allow several developers to work in parallel with the same Cassandra data center with no interference
Conclusion
Currently, Docker became one of the most important instruments for every developer. You can minimize the usage of system resources with Docker. Moreover, you don’t need to support multiple versions of operating systems.
Docker simplifies and speeds up the development and testing processes. You can never underestimate the importance of Docker. We can compare the benefits of this revolutionary container service with Git.