Kubernetes: Should your organization adopt K8S for your infrastructure?
First released in 2014 by Google, Kubernetes has quickly become one of the most popular hosting platforms. Despite this, a trend I see is that adoption has been light for enterprise applications. It’s been out nearly a decade, and many organizations still avoid or ignore it. Should your organization adopt the newest edition to hosting? Let’s peel this onion and find out.
The intended audience of this article is C-Level, Directors, and VPs trying to learn more about Kubernetes with less in-depth technical knowledge. With this in mind, I will be cutting corners and explaining items in more realistic terms as opposed to technical terms. This does mean that there will be instances that are not 100% accurate on a technical level.
Common Infrastructure Configurations
With the introduction of many ‘cloud native’ hosting platforms like AWS and Azure, the world of infrastructure has been greatly complicated. With this in mind, I’m going to focus on the more common or traditional hosting terms of Bare Metal and Virtual Machines. I will also include containers and Docker Compose, for comparison purposes.
For the purposes of this article, a Bare Metal Machine refers to a single ‘server’. A Virtual Machine is a single ‘server’ running a Virtualization software like VMWare that is hosting multiple VMs. I say this because most of the cloud providers that exist are virtualized. When you go to AWS and create a new server, there isn’t someone going out to the data center and installing a physical machine anymore as it’s purely virtual.
What are containers?
While the introduction of containers is still a fairly recent term, the concept has existed for quite a while. If you’ve ever heard the term ‘Jails’ in FreeBSD, the concept is comparable.
A ‘Jail’ is a virtualization layer (of sorts) on the server that is used to prevent two users or two pieces of software from knowing that the other exists. Done correctly, you can run multiple softwares in isolation from each other and they will never know the other exists. Furthermore, this goes down to the kernel or heart of the server, so you can have multiple processes running in one jail that are all available to that one jail, but not others.
Containers take this concept of jails to the next level. It increases the separation of the software as well as introduces an ephemeral nature to the running processes. As containers are ephemeral, once the container ‘dies’, it is as if the container never existed.
How do Containers differ from Virtual Machines?
It can be easy to mislabel ‘containers’ as Virtual Machines. They do share many similarities in the isolation they provide. However, there is one key difference that sets them apart, and that is the isolation layer.
In a Virtual Machine, a set amount of ‘hardware’, that is to say, RAM and CPU, are shared with the VM. The Virtualization software limits the VM from getting any more or less than what the VM is configured for. This is done at the time the server is started. Once started, these cannot change. In this scenario, everything about the hardware of the VM is emulated. In other words, it’s fake.
A container, on the other hand, actually shares the Kernel, or the heart, of the OS. Because the Kernel is shared, the hardware like CPU and RAM aren’t being split or emulated. The full amount of the server is theoretically available to the container.
A rather simple, albeit crude, method of explaining this is to think of the train sets you typically see at Christmas. If you were to set this up like a Virtual Machine, you would buy two train sets and create two circles with the tracks that aren’t connected. To ‘Change’ the VM, you would have to move the engine from one track to the other. In a container setup, you would create a figure eight. You combine the two trains in a chain, doing a double engine as you often see on the actual trains. This train would then go back and forth between the two ‘containers’ or circles in the figure eight.
Bonus: When Docker was first released, both Windows and OSX had to run a VM behind the scenes to run Docker. This was because the concept of the Kernel doesn’t exist on Windows, and on OSX the BSD Kernel isn’t compatible with Docker. Even today, a sublayer of virtualization exists through Hyper-V. Only on a formal Linux machine can you get the full benefits of Docker with no micro-latencies.
What is Docker Compose?
Docker is a wrapper for the Docker Engine for running containers. Docker Compose used to be a separate tool from Docker but is now included as a sub-command of the Docker engine. Docker Compose uses a YAML file to define an environment, to say which containers to run, and the configuration for those containers.
Kubernetes Overview
What is Kubernetes?
Kubernetes, or K8S for short, is a container orchestration platform. It works on the principle of establishing a ‘required state’ and then automatically making changes to the current state to reach the established ‘required state’. The ‘pod’, the most common type of data resource in K8S, is made up of one or more containers. If you tell K8S you need 3 copies of a given pod, but there is currently only one copy, then K8S will automatically spin up 2 new pods. Likewise, if you tell K8S you need 2 copies, but there are currently 3 pods, then K8S will delete the third pod.
This is a massive oversimplification of K8S and what it provides. In addition to pods, you have objects like Deployments, Ingress, and Secrets to name a few. As the intent of this article is more on whether you should adopt K8S, I will not be going into all the internals of what K8S provides for you.
How is Kubernetes hosted?
A K8S cluster uses the concept of Nodes. Ultimately, a ‘Node’ is just a server. There are two types of Nodes: The Master Node and the Worker Node. The Master Node is in charge of maintaining the aforementioned ‘required state’. It is the brains of the operations. The Worker Node is what is responsible for actually running or processing the aforementioned ‘pods’.
In a development environment, it is common to have a Worker Node and a Master Node as part of a single Node. This will be referred to as a ‘Single Node Cluster’ and is useful when you don’t need High Availability and are just testing. For a production environment, you will want to have your Master Node separate from your Worker Node. In theory, you can host a Master Node on a VM, but your Worker Nodes should always be on a Bare Metal machine.
One of the biggest benefits of K8S is how many Worker Nodes it supports. As of v1.19 of K8S, it can handle up to 5000 Worker Nodes given enough resources on the Master Node. What this means is that it is very easy to add or remove resources to a K8S cluster. In a Bare Metal machine, you are limited by the maximum amount of RAM/CPUs the server can handle. In a Virtual Machine, you can have multiple VMs on one host, but you are again limited by the maximum amount of RAM/CPUs the host machine can handle. In a K8S cluster, this limit is substantially higher. Once you start ‘running out’ of memory, you simply add a new node to the cluster. K8S treats, or ‘sees’ the resources of all Worker Nodes collectively. It will leverage all available resources to schedule the pods.
Bonus: Most major cloud providers offer a K8S cluster option. For example, Azure has their ‘AKS’ solution. These cloud solutions tend to offer you a High Availability Master Node that is pre-configured and free. All you have to do is add one or more Worker Nodes. The only cost you have is for the Worker Nodes.
Comparing Hosting Solutions
Now that I’ve discussed what K8S is, I want to spend some time directly comparing them to the previously mentioned hosting solutions.
Kubernetes vs Bare Metal Machines
When you use Bare Metal Machines, you will typically have an IT or Ops department that is responsible for the configuration of the servers. This includes installing the product software, configuration of those software, and space management. This ultimately requires that your Ops department be well-versed in the specifics of the product. This applies to every product the company owns, requiring even more knowledge.
One of the benefits of K8S is that it offloads a lot of this knowledge to the owners of the product. That is to say, Ops is still responsible for the setup and configuration of the K8S Cluster, responsible for providing adequate storage space, and of course security. However, that is where their involvement ultimately stops. At this point, it becomes the team that manages details like which products to use, providing the appropriate container or service, and the configuration. This is done with the use of Helm Charts or the K8S configurations.
I have also seen the scenario where Ops ‘owns’ up to and including the OS, but the team owns everything else like installing the database software. In this scenario, one or more members of the team have admin-level access to install software or modify the server. While this scenario is similar to a K8S install, you still lose out on all the auto-configuration that K8S provides. Further, your team’s engineers have to be a Principle Full Stack Developer to manage these servers half as well as your Ops Team could or should. It also tends to be a violation of the Zero Trust Security Policy, in my opinion.
In a Bare Metal environment, if your application needs more resources, your Ops team is responsible for adding the new server, configuring it, and ultimately including it in the infrastructure. This is something that frequently will take a good bit of preparation and communication, so you are looking at quite a bit of time to do. In a K8S cluster, all Ops has to do in that same scenario is add a new node to the cluster and they are done. This is substantially shorter to do, and can usually be done in a day.
Bonus: Most major cloud providers provide what is called ‘Vertical Pod Autoscaling’ in their K8S management system. While technically a K8S term, this is a feature that basically will automatically add additional nodes to the cluster based on workload. You set the maximum number of worker nodes, and the cloud provider will automatically add or remove nodes based on available resources.
Kubernetes vs Virtual Machines
When comparing Virtual Machines to K8S, it’s worth noting that just about everything that applies when comparing to Bare Metal Machines will apply to Virtual Machines as well. Ultimately, a Virtual Machine is a smaller ‘chunk’ of a bigger machine. Unless your Ops team is good enough to assign 10 VMs with 10GB of RAM to a Host Machine with 36GB of RAM, the biggest thing to consider is that when using Virtual Machines, your available resources will be less per VM since each VM reserves that full chunk of RAM and CPU.
Bonus: I’m being completely sarcastic when talk about your Ops team being able to assign 10 GB of RAM to 10 VMs to a Host Machine with 36 GB of RAM. This is simply not possible. If your server has 36 GB, then the absolute ‘most’ RAM that the server can support is 36 GB. You can have 2–16 GB VMs, 3–12 GB VMs, 12–3GB VMs, or some combination therein. While the memory can be ‘extended’, it's using the Hard Drives to account for the missing RAM. You will destroy your Hard Drives this way. I’ve seen director-level Ops members do this before. It simply isn’t correct.
Kubernetes vs Docker Compose
Docker Compose and K8S are very similar in their purpose. They both use YAML files to establish a running set of containers. That is however where the similarities ultimately stop. For example, Docker Compose doesn’t monitor the running containers to ensure it always has the correct containers running. Additionally, in Docker Compose, you can only have one copy of the container while in K8S you can have as many copies as you want. Granted, that’s not technically true about Docker Compose. You could have more than one, but you are responsible for redefining each configuration and configuring the networking between those containers yourself. K8S does this all for you, including items like load balancing the connections among all the pod copies.
What are some of the Cons of Kubernetes?
It wouldn’t be fair of me to talk in such a positive manner about K8S without talking about the cons.
Learning Curve
By far, the biggest con of K8S is the Learning Curve. K8S adds a lot of components that all mesh in a very unique way. This creates a massive Learning Curve, especially for the Ops team managing the K8S Cluster. Admittedly, this learning curve gets exponentially easier as time goes along, but can be challenging for Ops and software engineers alike at first. It is absolutely imperative that if you take the K8S route, you hire someone who has been managing K8S for AT LEAST 3 years. This is especially true if you are using third-party software provided as a K8S installation.
!!READ THAT AGAIN!!
You may think I’m being dramatic by saying you have to hire a new employee, but I’m quite serious. I’ve worked with several clients who decided to move to K8S. These clients ended up just using their existing Ops staff who had no prior experience with K8S. This meant that my team and I were essentially acting like trainers, teaching them about basic K8S principles, and how to configure their cluster. We effectively became the Ops team for these clients when we were simply developing the software they used in their company.
Latency
I talked earlier about the concept of Worker Nodes and Pods. The pod is the basic unit of work in any K8S Cluster. In a typical system, you will have more than one pod. This may be multiple copies of the same pod, or it could be multiple unique pods. Either way, depending on your configuration, it is very likely that pods of a given application will be created on different nodes in the cluster. This creates a scenario where you will get additional latency between pods when they communicate with each other.
It’s worth noting that this is ultimately no different than if you had software running on different VMs or Bare Metal machines. Additionally, there are certain things you can do like establishing node affinity to ensure proper configurations of the pods to limit this latency as much as possible. Personally, I’ve never seen any latency that I didn’t consider acceptable.
To be clear, I am talking about sub-second latency. That is to say that installing K8S isn’t going to cause you to suddenly have subpar responses. However, you may find that when you go from a development single-node cluster to a production multi-node cluster there are latencies.
Cost
This shouldn’t come as a surprise, but the cost of running K8S is proportional to the number of nodes in your cluster and the resources in those nodes. This could easily drive costs up if you have these nodes running all the time. It’s imperative to size your cluster correctly and consider Vertical Pod Autoscaling, Horizontal Pod Autoscaling, and any other options for resizing based on the workload to lower costs as much as possible. In my experience, clients can run into some shell shock when they start seeing some of the costs associated with cloud hosting providers like AWS. Generally, this is a sign the company isn’t leveraging the infrastructure the way it should be by implementing cost-saving concepts like shutting down machines during off hours when they aren’t being used. In other words, they are running the same resources needed to run K8S at 1:00 PM during their prime-time business hours as they are at 1:00 AM when nobody is actually in the office to use the software.
Bonus: When implemented and leveraged correctly, one of the greatest benefits of Cloud Providers like AWS is the ability to shut down resources dynamically when those resources are no longer in use. For example, starting a VM to process video editing and deleting the VM when it is done. You only pay for what you use. In a Data Center, this isn’t realistically possible. This is where the true Cost Savings of Cloud Providers comes into play. The upfront cost is likely to be higher, but the long-term costs of running can be drastically lowered when configured properly.
Storage
As mentioned above, K8S is meant to be ephemeral. This means that they can be short-lived and when the pod is gone, no trace exists. If you were to connect to a pod, modify the file system, and restart the pod, your changes would be lost due to the immutable nature of the containers. This means you have to provide stateful storage to the pods. This introduces additional configurations and considerations that you have to work with. Additionally, this storage frequently is supposed to be shared among all the nodes and has to be accessed across the network.
Bonus: Most Cloud Providers require you to pay for their ‘storage’ options separate from your server costs. This can further drive up costs. NFS can be used for storage as well, to help alleviate some of these costs if you already have something available.
Virtualization
K8S is an amazing software, but there are some things you really should not run in K8S. The best example of this is a relational database like Oracle, MySQL, or MsSQL. While you can technically run these in a Statefulset, there are often a lot of complexities you have to deal with that don’t make it worth it.
Bonus: When using a Cloud Provider, before you just instantly try to run a service in K8S, you should compare it to the options the providers have as you may find the provider already has that service in a pre-built package.
Should You Adopt K8S?
Now that we’ve gotten the definitions out of the way, it’s time to talk about whether or not your organization should adopt K8S. As with all technology, it’s going to ultimately depend on what you are trying to do with K8S.
For example, I run everything in K8S. I have Discord Bots, websites, Twitch Bots, and CI/CD pipelines all running in my various K8S clusters. My wife recently wanted me to set up a private PalWorld server for her. I installed K3S, a slimmed-down K8S software, and set up a Helm Chart to fully host this PalWorld server. K8S works well for me and makes it so that there is very little I have to manage. It fits perfectly into my DevOps and DevSecOps-centric knowledge. I use my Cloud Provider’s database hosting but otherwise use nothing but K8S with very few exceptions.
However, nothing I’m hosting is resource-intensive. I’m not running anything that has to process more than 20 requests per second. Furthermore, I’m able to do this at an acceptable cost for me. Additionally, I’ve been using K8S for close to 4 years. I know all the basics and can set up a new Helm Chart for an existing infrastructure very quickly. This gives me a leg up on any company with no K8S experience under their belt.
So, ultimately, you have to ask what are you trying to accomplish by moving to K8S. If you are researching K8S because your Ops teams suggested it, then I would defer to them. They can provide all the reasons they think it would be worth it. If you are researching it because you want to be able to easily expand vertically and horizontally, then K8S is an excellent option. If however, you are considering it because it’s the newest kid on the block, but you are hosting software on a single server node that gets edited once a year, then you may not need to bother. Terraform and Ansible are excellent choices instead of a full-blown K8S cluster when you only have a single server doing everything. If you are already fully hosting a production environment in a data center and considering the move to cloud services, then it will depend on the cost of moving, the cost of hiring the knowledge, and other such points.
Conclusion
I’ve been managing infrastructure for 15 years on top of being a software engineer. Moving to K8S has made infrastructure management far easier and less stressful for me. I haven’t once upgraded K8S on my worker nodes because my cloud provider automatically provisions a new node on the latest version and transfers everything for me, without my having to blink twice. This leaves me more time to focus on my engineering and growth. I think K8S is an excellent option and should be considered a viable hosting solution by any company. However, I do acknowledge it’s not the easiest solution and needs to be carefully considered before making the jump.
Have something you wish for me to talk about? Let me know and I will see if it’s something I can look into.