How Scalable is Argo-Rollouts: A Cloud Operator’s Perspective

Argo-Rollouts enables advanced deployment capabilities to Kubernetes such as blue-green/canary update strategy, automated rollback and promotion, configurable update steps, and fine-grained, weighted traffic control. As Argo-Rollouts reaches its first major release of v1.0, companies are working rapidly to adopt Argo-Rollouts into their continuous deployment infrastructure. Further, work is underway to prove Argo-Rollouts’ scalability.

In this talk, we present our methodology of benchmarking Argo-Rollouts controller to manage the life cycle of a large number of Rollout custom resources in a realistic cloud environment. For this purpose, we developed a load-generation and performance measurement tool argo-rollouts-benchmark to emulate users, making continuous requests using k8s API with defined quantities and concurrency (e.g., create 100 rollouts in the cluster by 10 concurrent users). While the Argo-rollouts controller under test reconciles these Rollout CRs to the desired state, the benchmark tool collects the following metrics: convergence latency (The amount of time between the rollout CR is received by the controller and reaches a conclusive phase such as healthy, degraded, paused) in percentile distribution, timeout error rate (the percentage of degraded rollouts due to timeout), and throughput.

We will share the latest results from our experiments, as well as how these results help improve the overall scalability of Argo-Rollouts. We then looked at ways, such as predetermined t-shirt sizes and autoscaling, to optimize the resource provision of Argo-Rollouts to accommodate various customer demands. Based on these findings, we can define the SLO for our deployment capability offerings built atop Argo-Rollouts. Finally, the talk shows how to evaluate the Argo-Rollouts performance in your own clusters.