Like most startups we began small with smaller teams, a smaller scale and a host of typical engineering problems.We started off with small agile teams — each setting up their own infrastructure for their applications for the quickest turnaround. Each team in turn also had their own different ways of building and deployment. This approach had a couple of very obvious issues:
- Poor resource utilisation — All our applications make heavy use of caching (high memory utilisation) and need much lower CPU. Since there were no exact matches among AWS instance types, we tended to use larger than required machines.
- Managing permissions — To start off, we allowed most developers to provision their own resources, which worked in the short term, but became hard to audit as we began to scale rapidly (both machines and developers)
Individual teams tried solving the problem of poor resource utilisation by manually clubbing compatible services, but this broke isolation, and required constant rework to keep things balanced. Instead of applying short term, band-aid solutions we decided it was time to fix both problems permanently.
Kubernetes to the rescue!
Looking around for pre-made solutions, Kubernetes caught our eye. Kubernetes (k8s) is the most popular container orchestration software available, which is used as a standard across many big and small organisations. We evaluated it quickly, and realized it was the perfect tool to solve both our problems.
We chose AWS’ Elastic Kubernetes Service(EKS) as the platform to host our k8s cluster, since it offloaded the control plane management for us. It would also allow us to define our entire infrastructure as configuration, ensuring that on-boarding a new application was as easy as creating a basic config file (with only a few mandatory values, and some application specific overrides).
It took us a few weeks to define what our infrastructure should look like
- An EKS cluster : this is where we host our worker nodes that run our applications
- A Build and deploy pipeline: Jenkins for performing the builds & Spinnaker as the deployment platform
- Logging: via Fluentd
- Monitoring: through Prometheus, Grafana and NewRelic
The EKS Cluster
We setup a dev, a stage and a prod cluster. We used the dev cluster for all the initial testing and experimentation. Only once we finalised what settings we wanted, we set up the stage and prod cluster.
Amazon provides amazon optimised AMIs for EKS, with settings like max pods in a node, etc. which we used along with user data from 90 days of AWS EKS in Production for our worker nodes. It allowed us to reserve CPU and Memory for system processes and kubelet, thereby preventing a faulty container from taking up all the CPU.
The build and deploy pipeline
To build and deploy an application we used
- Jenkins: We use a Jenkins job that is configured via a Jenkinsfile for (i) creating docker images using a multi-stage docker build (ii) Configuring the Kubernetes manifests for the application and pushing to the chart repository.
- Spinnaker: For deploying the applications
For all our applications, we build images using a multi-stage docker build for isolation of builds, and used slim images as runner containers(alpine images are smaller than slim, however they have an open DNS issue). This setup helped us optimise on the size of the OS, enabled better security measures, fast and easy scaling, and lowered network and storage costs.
We established standards for Dockerfiles, Jenkinsfiles, and k8s manifests which are followed by all teams to build & deploy their applications.
For every application, we need
- CPU and Memory requirements
- Load balancer configuration
- Scaling criteria
- High availability of the application
- Easy deployments/rollbacks
- Graceful Termination
To fulfil all these requirements, we use helm as a templating tool, and created a generic template, or helm chart, which is overridden by each application, and finally pushed to the chart museum.
This allows us to keep all boiler-plate configurations in one place, and update all apps as necessary.
Eg. We wanted to add a pre-stop hook to every application, so all we did was update the generic template, as opposed to having to edit individual charts in an application’s own repositories.
The chart museum is hosted on the cluster itself, with S3 serving as a persistent storage.
Our helm template is organised as follows:
- Deployment configuration: The deployment defines 2 containers — the application container, and a Fluentd-CloudWatch sidecar container (explained in detail further below). Configs like update strategy, pod anti-affinity, liveness and readiness probes are defined here.
- Service: These are application specific service configurations
- Ingress: We defined 3 types of ingress manifests — external, internal and VPN. These manifests allow us to share load balancers between applications. As compared to the 80+ load balancers in our previous setup, where one Load Balancer was dedicated to 1 application, we now need as little as 6–10 load balancers in our k8s setup across all environments
- Service Monitor: This is used to expose business metrics to prometheus
- Pod Disruption Budget: This is created to prevent the application’s instance count to fall below a certain threshold. However, this is helpful only against voluntary termination of instances.
Defining a stable deployment system was one of the hardest tasks. When the focus is on standardising the development and deployment pipeline, the idea of distributing kubeconfigs to each developer, and running a `kubectl apply` for deploying an application seems inconsistent. What we needed was a central system, that could perform seamless deployments and rollbacks, with authentication and authorisation capabilities.
We tried the following options for our use cases:
We rejected the first 3 and went ahead with Spinnaker for the following reasons:
- Jenkins-X used to create all the deployment revisions in GitHub itself, this redundant step increased the deploy time to almost 10 mins from 4–5 mins.
- Kubeapps had service account token based authentication and authorisation, which created the complexity of maintaining tokens for individuals.
- Harness.io wasn’t out with the community edition back then, and their pricing policy was aggressively targeted towards the number of deployment instances, ie, pods in terms of kubernetes. Our near future estimates showed that the costs incurred would outweigh the benefits, deeming it unfit for a longer run.
Spinnaker for the win!
Its commands section was sufficient for us to define what we needed. After having configured spinnaker, we proceeded with defining pipelines for all the applications, which were triggered on completion of the Jenkins job, and the end result was k8s pods running in their respective namespaces.
In our previous setup, we had all applications writing logs to different files such as access, debug, info, error, and the CloudWatch agent was consuming those files. We moved to k8s and didn’t want to change much of that.
Approach 1: Mount the file system of the host machine onto the container so that applications can write to the host machine, and have an agent (CloudWatch agent or Fluentd) push to CloudWatch.
Approach 2: Deploy a Fluentd sidecar along with the application, both mounting the same volume specified in the deployment config. This way, logs are written to a temp storage by the application pod, and read by the Fluentd pod.
We use the second approach in production. We use kubernetes metadata to define our log group and log stream.
For applications like NGINX and kube-proxy which write to stdout and stderr, we deployed a Fluentd DaemonSet, that reads from stdout and stderr of the containers, as well as other system logs, and pushes to CloudWatch. AWS has documented the steps here.
We’ve been using a prometheus and grafana setup, along with new relic to capture all system and application metrics. When applications are migrated to k8s APM works out-of-the-box because of the new relic agent in the codebase.
For k8s-based monitoring, there’s a plethora of dashboards providing all necessary visualisation on the CPU, memory and disk metrics of the cluster. Below is an NGINX ingress controller dashboard showing request and error metrics across namespaces. We also have dedicated monitoring for spinnaker components within the same setup.
In the end…
Onboarding a new application is now as simple as running a script, which creates necessary resources on AWS and K8s (such as namespaces and ECR repository) and the Spinnaker pipelines. Also, with every config as code, teams are now able to make reliable changes to their application setup, using GitHub alone.
Curefit already has 25% of its applications running on K8s, with a plan to migrate 100% of applications over the coming months.
We’ve already managed to reduce our costs by 60% so far on the services that we’ve migrated, which is bound to grow as and when we migrate more services (since fixed costs remain the same).
The deployment time has gone down by almost 75%, even though we’ve introduced alpha deployments to the pipelines.
Scaling up applications has also become faster by 85%, since there is 0 wait time for instances to come up( we typically run with 30% spare capacity). Images are always present on the instances, and the containers pass the readiness check in under 1 min. This allows us to handle a traffic surge with almost no impact on latencies.