I am testing different offerings from the three major cloud providers around their managed Kubernetes services. Amazon Web Services has EKS, Microsoft Azure has AKS and Google’s Cloud Platform has GKE.
Google’s integration to their management console is leaps and bounds above the others. From seeing workloads specific to Kubernetes, such as virtual machine instances. Kubernetes workloads (Pods) and Services. From that, most of the services have similar offerings, tie-ins to AWS EBS for persistent storage or Microsoft’s integration with Azure files.
AWS is really the new kid on the block when it comes to managed Kubernetes. While they had ECS (Elastic Container Service). In order to run Kubernetes, one had ot create EC2 instances and run KOPS to deploy a cluster. Now with the managed integration using EKS, it’s much simpler, but it also means seamless integration is not quite there yet. Though, it is straight forward attaching a cluster to a VPC or using IAM policies and roles.
Though one of the things that stood out to me was time taken to deploy a cluster on each provider. Whether your run one (1) cluster with many namespaces or 1000 clusters with many namespaces, time to get this running is important.
Deploying similar infrastructure to different cloud providers and managing this in a simple and code-like way is a perfect use case for Terraform by Hashicorp. Though running a bunch of terraform plans, terraform apply and terraform destroy while creating and bringing down multiple clusters is still no fun. I wrote a simple script to deploy, destroy and refresh these different environment. While I recommened if running in production, to integrate with an automation server (CI/CD Pipeline tool) such as Spinnaker, Jenkins and TravisCI amongst others, whatever your taste is or whatever supports your use-case.
List of tools:
- Script to automate deployment
- time commandline utility
Environment for Testing
I’ll deploying as much of a like-for-like environment as possible with each provider. The below diagram was built with lucidchart.com and all thought it’s “AWS” centric, the design applies to all.
- A network for the clusters to exist in
- Multiple subnets
- Three (3) initial nodes with autoscaling enabled
- And some form to provide access from the internet with
- Ingress Controllers
This is nothing scientific, it’s a simple script that takes some flags and runs actions depending on those flags.
- Script called with each provider
- Script also runs
- calls patch AWS storage and make it usable
- Retrieves kubeconfigs and merges them into global KUBECONFIG
- Method passed performs actions
- terraform plan and terraform apply
- terraform destroy (if action is to take down cluster) - not shown here
- time commandline utility passed to the script
cloud/examples/azure-k8s-cluster via ⬢ v6.10.3 on master [⇡] •100% [I] ➜ time ../clusters.sh deploy --azure Deploying to azure-k8s-cluster -> $HOME/kubernetes_clusters/cloud/examples/azure-k8s-cluster Refreshing Terraform state in-memory prior to plan... The refreshed state will be used to calculate this plan, but will not be persisted to local or remote state storage. ------------------------------------------------------------------------ An execution plan has been generated and is shown below. Resource actions are indicated with the following symbols: + create ... output truncated ... Plan: 4 to add, 0 to change, 0 to destroy.
cloud/examples/aws-k8s-cluster on master [⇡] •100% [I] ➜ time ../clusters.sh deploy --aws Deploying to aws-k8s-cluster -> $HOME/kubernetes_clusters/cloud/examples/aws-k8s-cluster Refreshing Terraform state in-memory prior to plan... The refreshed state will be used to calculate this plan, but will not be persisted to local or remote state storage. data.aws_availability_zones.azs: Refreshing state... data.aws_region.current: Refreshing state... ------------------------------------------------------------------------ An execution plan has been generated and is shown below. Resource actions are indicated with the following symbols: + create ... output truncated ... Plan: 29 to add, 0 to change, 0 to destroy.
cloud/examples/gcp-k8s-cluster took 14s on master •100% [I] ➜ time ../clusters.sh deploy --gcp Deploying to gcp-k8s-cluster -> $HOME/kubernetes_clusters/cloud/examples/gcp-k8s-cluster Refreshing Terraform state in-memory prior to plan... The refreshed state will be used to calculate this plan, but will not be persisted to local or remote state storage. data.google_compute_zones.zone_available: Refreshing state... ------------------------------------------------------------------------ An execution plan has been generated and is shown below. Resource actions are indicated with the following symbols: + create ... output truncated ... Plan: 3 to add, 0 to change, 0 to destroy.
Quite a bit of terraform output from the planning stage removed, just to keep this feasibly readable ;).
Lets start with the “slowest” to fastest deployment, in other words, lets go in decending order.
Apply complete! Resources: 29 added, 0 changed, 0 destroyed. Running post aws-k8s-cluster deployment ../clusters.sh deploy --aws 3.38s user 2.26s system 0% cpu 10:24.69 total
Total time for AWS: 10:24.69
Apply complete! Resources: 4 added, 0 changed, 0 destroyed. Running post azure-k8s-cluster deployment ../clusters.sh deploy --azure 2.21s user 1.85s system 0% cpu 14:19.72 total
Total time for Azure: 14:19.72
Apply complete! Resources: 3 added, 0 changed, 0 destroyed. Running post gcp-k8s-cluster deployment ../clusters.sh deploy --gcp 1.50s user 1.18s system 1% cpu 2:22.11 total
Total time for GCP: 2:22.11
According to the time utility, the deployment Azure took roughly 14 minutes and 20 seconds. While to AWS took about 10 minutes and 25 seconds and Google Cloud deployment took 2 minutes and 22 seconds. We’ll touch on that large time different between GCP and the rest in a bit.
I mentioned that the listed was from slowest to fastest right? So why is AWS first (being the slowest), you ask? That’s a great question! While it does seam that the deployment to AWS was about 4 minutes faster than to Azure, this result is measleading. I wanted to make sure that total time, from start to cluster being usable was taken into account.
From the moment the terraform apply job ended, the Azure deployment was completed and usable. Pods, Services, Deployments, Daemonsets all these things could be created running kubectl apply -f [file-name].yaml. While the AWS cluster needed another roughly 5 to 10 minutes for the EC2 instances running the worker nodes to complete booting and deploying. It’s not a static time and I did not measure with a stopwatch 😜.
What is impressive to me is the result from Google’s side. From the moment the terraform apply completed in just over two (2) minutes time, the cluster was fully usable. What also seems to be the case (which I will test later) is that deploying Kubernetes Services that integrate with the Cloud platform’s networking. Such as Load Balancers, GCP also seems to win in making those services accessible the quickest.
If you look at the output from the results section, you’ll also notice the discrepancy between AWS and the others in the number of resources deployed. This is not exactly true, as many other things are happening behind the scenes when deploying. AWS gives you the control of manually deploying the autoscaling groups, VPCs, subnets, security groups and others. While others, this is a simple variable or setting changed in the terraform resource template. This can be looked as either negative or positive, depending on which side of the argument you sit on. That’s not to say that those things are not configurable as well with the other terraform providers, but they are NOT required with GCP or Azure, while they ARE with AWS.
This is not an endorsement or desapproval on any of the providers. It is simply one metric and something I found interesting and useful to consider. They each have their benefits, especially if the organization you work for already has workloads in one or multiple providers.
I would say is, one of the biggest benefits to Kubernetes is that it abstracts away the requirement to know which networks Pods are sitting in. With service discovery built in or with great tools such as [Istio][istio] for service mesh deployments. Going multi-cloud is easier than ever, allowing not only multi-region workloads and provider independent as well.
If you’re considering containarizing applications, the easiest and quickest way to get in production is with one of the Cloud providers Kubernetes services. It is definitely possible to have your own clusters, I wouldn’t want to be you administering them, especially if considering high-availability with the master nodes.
Takes about the same amount of time to destroy the clusters on each.