Processing petabytes in Python with Argo Workflows & Dask

Companies with complex computational workloads often use Python packages such as NumPy, Pandas, and Scikit-Learn, but unfortunately they don't scale up well for especially large data sets. Dask makes it easy to use these tools on large data sets in distributed environments with low-latency. Argo Workflows is the best way to run pipelines on Kubernetes but incurs high overhead when tasks are short-lived. This talk demonstrates how to orchestrate these two technologies to achieve the kubernetes-native scheduling and automation available from Argo Workflows with the low-latency scalability of Dask.

To do so, we'll show how Pipekit provisioned Argo Workflows for ACCURE Battery Intelligence and show how ACCURE uses Dask running on Argo to process petabytes of data with super low-latency for their customers.

We hope that listeners of this talk will learn how to orchestrate their Dask workloads with Argo Workflows and additionally gain insights for orchestrating other workloads with Argo such as Spark jobs.