How Much Time to Deploy a Kubernetes Cluster

I am testing different offerings from the three major cloud providers around their managed Kubernetes services. Amazon Web Services has EKS, Microsoft Azure has AKS and Google’s Cloud Platform has GKE.

Google’s integration to their management console is leaps and bounds above the others. From seeing workloads specific to Kubernetes, such as virtual machine instances. Kubernetes workloads (Pods) and Services. From that, most of the services have similar offerings, tie-ins to AWS EBS for persistent storage or Microsoft’s integration with Azure files.

AWS is really the new kid on the block when it comes to managed Kubernetes. While they had ECS (Elastic Container Service). In order to run Kubernetes, one had ot create EC2 instances and run KOPS to deploy a cluster. Now with the managed integration using EKS, it’s much simpler, but it also means seamless integration is not quite there yet. Though, it is straight forward attaching a cluster to a VPC or using IAM policies and roles.

Though one of the things that stood out to me was time taken to deploy a cluster on each provider. Whether your run one (1) cluster with many namespaces or 1000 clusters with many namespaces, time to get this running is important.

Disclaimer:

This is not a Kubernetes tutorial, check out the official documentation or some of the awesome books out there like Kubernetes in Action for an introduction.

Tools Used

Deploying similar infrastructure to different cloud providers and managing this in a simple and code-like way is a perfect use case for Terraform by Hashicorp. Though running a bunch of terraform plans, terraform apply and terraform destroy while creating and bringing down multiple clusters is still no fun. I wrote a simple script to deploy, destroy and refresh these different environment. While I recommened if running in production, to integrate with an automation server (CI/CD Pipeline tool) such as Spinnaker, Jenkins and TravisCI amongst others, whatever your taste is or whatever supports your use-case.

List of tools:

  • Terraform
  • Script to automate deployment
  • time commandline utility

Environment for Testing

I’ll deploying as much of a like-for-like environment as possible with each provider. The below diagram was built with lucidchart.com and all thought it’s “AWS” centric, the design applies to all.

  • A network for the clusters to exist in
    • Multiple subnets
  • Three (3) initial nodes with autoscaling enabled
  • And some form to provide access from the internet with
    • Services
    • Ingress Controllers

Diagram

Kubernetes Diagram

The Testing

This is nothing scientific, it’s a simple script that takes some flags and runs actions depending on those flags.

  • Script called with each provider
    • AWS
    • Azure
    • GCP
    • Script also runs
    • calls patch AWS storage and make it usable
    • Retrieves kubeconfigs and merges them into global KUBECONFIG
  • Method passed performs actions
    • terraform plan and terraform apply
    • or
    • terraform destroy (if action is to take down cluster) - not shown here
    • time commandline utility passed to the script

Creating Clusters


Azure Deploy


cloud/examples/azure-k8s-cluster via ⬢ v6.10.3 on  master []
•100% [I]time ../clusters.sh deploy --azure

Deploying to azure-k8s-cluster ->
$HOME/kubernetes_clusters/cloud/examples/azure-k8s-cluster

Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.


------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

...
output truncated
...
Plan: 4 to add, 0 to change, 0 to destroy.

AWS Deploy


cloud/examples/aws-k8s-cluster on  master []
•100% [I]time ../clusters.sh deploy --aws

Deploying to aws-k8s-cluster ->
$HOME/kubernetes_clusters/cloud/examples/aws-k8s-cluster

Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

data.aws_availability_zones.azs: Refreshing state...
data.aws_region.current: Refreshing state...

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

...
output truncated
...
Plan: 29 to add, 0 to change, 0 to destroy.

GCP Deploy


cloud/examples/gcp-k8s-cluster took 14s on  master
•100% [I]time ../clusters.sh deploy --gcp

Deploying to gcp-k8s-cluster ->
$HOME/kubernetes_clusters/cloud/examples/gcp-k8s-cluster

Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

data.google_compute_zones.zone_available: Refreshing state...

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

...
output truncated
...
Plan: 3 to add, 0 to change, 0 to destroy.

Quite a bit of terraform output from the planning stage removed, just to keep this feasibly readable ;).

Results

Lets start with the “slowest” to fastest deployment, in other words, lets go in decending order.

AWS Results

Final output:

Apply complete! Resources: 29 added, 0 changed, 0 destroyed.
Running post aws-k8s-cluster deployment

../clusters.sh deploy --aws  3.38s user 2.26s system 0% cpu 10:24.69 total

Total time for AWS: 10:24.69

Azure Results

Final output:

Apply complete! Resources: 4 added, 0 changed, 0 destroyed.
Running post azure-k8s-cluster deployment

../clusters.sh deploy --azure  2.21s user 1.85s system 0% cpu 14:19.72 total

Total time for Azure: 14:19.72

GCP Results

Final output:

Apply complete! Resources: 3 added, 0 changed, 0 destroyed.
Running post gcp-k8s-cluster deployment

../clusters.sh deploy --gcp  1.50s user 1.18s system 1% cpu 2:22.11 total

Total time for GCP: 2:22.11

Results Review

According to the time utility, the deployment Azure took roughly 14 minutes and 20 seconds. While to AWS took about 10 minutes and 25 seconds and Google Cloud deployment took 2 minutes and 22 seconds. We’ll touch on that large time different between GCP and the rest in a bit.

I mentioned that the listed was from slowest to fastest right? So why is AWS first (being the slowest), you ask? That’s a great question! While it does seam that the deployment to AWS was about 4 minutes faster than to Azure, this result is measleading. I wanted to make sure that total time, from start to cluster being usable was taken into account.

From the moment the terraform apply job ended, the Azure deployment was completed and usable. Pods, Services, Deployments, Daemonsets all these things could be created running kubectl apply -f [file-name].yaml. While the AWS cluster needed another roughly 5 to 10 minutes for the EC2 instances running the worker nodes to complete booting and deploying. It’s not a static time and I did not measure with a stopwatch 😜.

The Winner

What is impressive to me is the result from Google’s side. From the moment the terraform apply completed in just over two (2) minutes time, the cluster was fully usable. What also seems to be the case (which I will test later) is that deploying Kubernetes Services that integrate with the Cloud platform’s networking. Such as Load Balancers, GCP also seems to win in making those services accessible the quickest.

If you look at the output from the results section, you’ll also notice the discrepancy between AWS and the others in the number of resources deployed. This is not exactly true, as many other things are happening behind the scenes when deploying. AWS gives you the control of manually deploying the autoscaling groups, VPCs, subnets, security groups and others. While others, this is a simple variable or setting changed in the terraform resource template. This can be looked as either negative or positive, depending on which side of the argument you sit on. That’s not to say that those things are not configurable as well with the other terraform providers, but they are NOT required with GCP or Azure, while they ARE with AWS.

Final Thoughts

This is not an endorsement or desapproval on any of the providers. It is simply one metric and something I found interesting and useful to consider. They each have their benefits, especially if the organization you work for already has workloads in one or multiple providers.

I would say is, one of the biggest benefits to Kubernetes is that it abstracts away the requirement to know which networks Pods are sitting in. With service discovery built in or with great tools such as [Istio][istio] for service mesh deployments. Going multi-cloud is easier than ever, allowing not only multi-region workloads and provider independent as well.

If you’re considering containarizing applications, the easiest and quickest way to get in production is with one of the Cloud providers Kubernetes services. It is definitely possible to have your own clusters, I wouldn’t want to be you administering them, especially if considering high-availability with the master nodes.

Bonus

Takes about the same amount of time to destroy the clusters on each.

comments powered by Disqus