by Jacob Beasley, Director of DevOps
Kubernetes is an open source system for automating deployments, scaling, and managing applications. SQL Databases are a common way of storing structured data for software applications, yet most companies find themselves deploying SQL databases using traditional virtualization technology or managed cloud tools such as Amazon RDS or Azure Managed SQL.
In this article, we will discuss why someone might deploy databases on Kubernetes as well as the best practices to ensure databases are highly available and performant. We will also briefly touch on common maintenance tasks such as backups, failovers, or upgrades.
Why deploy databases on Kubernetes?
There are two major reasons to deploy databases on Kubernetes: Compliance and Simplicity.
For many forms of compliance (for example, HIPAA), end-to-end encryption as well as two-way certificate validation (mutual TLS) is required to ensure that only authorized applications are talking to databases and that authorized data is only sent to safe places. Simple usernames and passwords are not enough. Tools like Istio can run on Kubernetes and layer in compliance through mutual-TLS, making it easier than ever to achieve HIPAA compliance.
Second, using Kubernetes can be simple and avoid adding on new kinds of infrastructure, so long as you are already using Kubernetes. With Kubernetes, you can easily deploy a database, declare how much storage it should have and where it should be located, decide what node port it should be available on, and 15 seconds later you have yourself a database that you can use in your test or production environments.
How does this compare to something like Amazon RDS?
Amazon RDS and Azure Managed SQL are easy ways to spin up a production-ready SQL Server in the cloud with some kind of scalable storage solution (SAN-based typically), backups, possibly replication, configuration management, upgrades, and alerting/metrics dashboards already setup.
Deploying on Kubernetes is similar, though you typically would use a Kubernetes operator instead of a cloud GUI or Command Line tool to do the deploying. You would use a CICD pipeline to manage configuration and rollouts, the cloud backup tools for backup, a Kubernetes operator to manage replication and upgrades, and Prometheus with alert manager to manage metrics dashboards and send alerts.
There are advantages to both approaches, and it is up to you to weigh the costs of each. Especially in lower-environments, Kubernetes may be a more cost-effective approach as well as have the benefit of allowing developers to largely own and spin up new database severs at will in non-prod environments. Its not uncommon, for example, to use one of the cloud-based managed SQL environments in production, but to let developers spinup whatever infrastructure they need in non-prod environments as they see fit.
One final thought on this point – when working in compliant settings, running databases in Kubernetes may have certain advantages due to network policies and mutual-tls, which is available with tools like Istio.
How do I perform an upgrade?
Typically, to perform an upgrade, you will simply deploy a new version of the SQL Server software (update the version in your deployment, operator, or helm chart), wait for it to roll out, and after it boots up successfully traffic will be shifted by Kubernetes over to the new version of whatever SQL Server you are deploying.
Learn more about Kubernetes Deployment Strategies
How does my application connect to my database?
You can expose your database as a Kubernetes Service and use network policies to only grant certain applications the ability to access your SQL database server. This has the same effect as a traditional firewall would have, but also takes advantage of the automatic scaling and redundancies available on Kubernetes.
If DBAs need to connect to the database for any reason, you can configure the database to be exposed through a node port and then connect using whatever SQL client you are most comfortable with.
How do I ensure my data is safe?
Typically, people are used to hosting applications in Kubernetes that do not store data directly on Kubernetes. SQL Databases usually store data on disk somewhere, but what happens if the virtual machine that is hosting your Kubernetes application goes away and its hard disk is lost? That would mean that you would lose your data, right?
In order to ensure your data is safe, you should utilize a SAN (storage area network) so that your data is not stored on the virtual machines directly. It is important to use a SAN (storage area network) instead of a NAS (network attached storage) because a SAN stores data at the block-level and typically has far better performance (oftentimes similar to that off an SSD on a local host machine, or better in some cases if they are striping across multiple underlying SSDs).
When using cloud providers like Amazon Web Services, Google Cloud, or Azure, SANs are available under various titles related to Cloud Storage. For example, on Amazon Web Services, their SAN solution is called Elastic Block Storage. They do not formally call it a SAN, but under the hood it is effectively a SAN and performs as such.
If using MySQL, Microsoft SQL Server, or PostgreSQL server, you just have to configure a Volume Claim for your deployment to mount the NAS or local Virtual Machine’s disk at whatever location that the particular SQL server normally stores its databases and their tables. Then, when the SQL Server boots up, it will automatically read and detect the current state of the database from the file system.
Comparison of Cloud Storage Options
How do I perform backups?
There are several ways to perform backups.
The simplest solution is to backup the underlying storage (SAN) using built-in tools from the cloud provider, such as taking nightly backups of an Elastic Block Storage disk. This closely mimics what you would do if you were running the database on a traditional VM. This basically bypasses the SQL server you are using and will work regardless of whether you are using PostgreSQL, MySQL or SQL Server. Also, this is very easy to automate if using a major cloud provider as its just a matter of checking a box to enable rolling backups of the data.
The other option is to use a tool like mysqldump on a regular basis to perform a backup. This is viable, though more scripting and coordination would be required by you to get this working.
How do I ensure high availability?
High availability means being able to recover from server outages almost instantly. The simplest solution is to provide a health check to validate that the SQL Server is still running as part of your pod. If that fails, then Kubernetes will automatically reboot it and, if a host virtual machine goes away or goes down for maintenance, it will automatically migrate the pod to another virtual machine in 15-30 seconds. This, however, does not have redundancy and disruptions to your virtual machine will cause a brief moment of downtime.
Another method would be to disable all in-memory caching and run multiple, horizontally scaled instances of your database server. Note that this only works if you disable all in-memory caching, so in effect the SQL Server always hits the disk and never servers queries out of an in-memory query cache. Then, if a virtual machine goes down, traffic will instantly be routed to another virtual machine. Also, upgrades/downgrades could be done with zero downtime. However, this does take a performance hit because in-memory caching would be unavailable. SQL Transactions can also add some complexity, so be sure to refer to your database vendors’ documentation.
How do I monitor my database?
Typically, Kubernetes Operators or Helm Charts come with Prometheus metrics scraper and dashboards, so if you are already using Prometheus, be sure to setup and enable the metrics scraping so that you can get interactive dashboards to help you manage and triage any issues you might encounter.
More about Prometheus dashboards from Percona
How do I get started?
The easiest way to get started is to use a popular helm chart or Kubernetes operator to deploy your SQL database. Below are some helpful links to get you started, organized by database vendor:
PostgreSQL
Microsoft SQL Server
SQL Server Operator (Microsoft)
MySQL / Maria DB
MySQL Operator (Oracle)
See more operators on OperatorHub.io
Are you ready to enhance your DevOps capabilities?
If so, reach out to our team! We deliver a full range of DevOps services and solutions that help propel transformative change for our customers. Our range of capabilities includes:
- Software Development
- Site Reliability Engineering
- Build and Support Services
- Secure Software and Systems
- Respond and Support Services
- Compliance and Security Capabilities
If there’s software that needs to be built, secured, or improved, our team can help. Feel free to reach out to us at connectwithu[email protected] or by filling out our brief Contact Us inquiry form if you’re interested in learning more about our DevOps services. We’re confident we can help you build reliable, secure, and powerful software that can truly move the needle for your organization and your customers.