Back in October, we released Aqueduct v0.1. That release laid the groundwork for simplified ML orchestration by defining a Python-native API that allowed you to construct machine learning pipelines, plumb those pipelines with data, and ensure that your models perform well on an ongoing basis.
Today, we’re releasing Aqueduct v0.2, and we couldn’t be more excited about the progress we’ve made. Based on feedback from our users, we’ve been expanding Aqueduct to integrate with a wide variety of cloud infrastructure, with the goal of being the simplest way to run machine learning in the cloud.
Aqueduct v0.2 has key innovations on both fronts.
Aqueduct v0.1 had basic support for running workflows on Kubernetes, Airflow, and AWS Lambda. A workflow defined in Aqueduct could run automatically on your existing infrastructure, but it was also quite limited — you couldn’t control the resources available to an Aqueduct operator or distribute the compute for a single operator.
As of v0.1.5, Aqueduct has support for fine-grained resource configuration on AWS Lambda and Kubernetes. We’ve added support for CPU and memory configuration, and on Kubernetes clusters, you can deploy operators on GPU. Aqueduct will automatically configure and deploy a container with the correct drivers installed to run your functions.
As of v0.2, Aqueduct has support for running workflows on Databricks Spark clusters (vanilla Spark coming soon!). This enables teams to take advantage of Spark’s support for seamlessly distributing compute, especially for easily parallelizable code. You can use Aqueduct’s decorator API to define a workflow that runs on a Databricks Spark cluster and have Aqueduct handle the deployment & orchestration and provide ongoing visibility.
Our focus here is to bring Aqueduct’s simple pipeline definition and detailed visibility to the power and scale of existing infrastructure. We’re just getting started with our integration suite, so if there’s systems you’d love to see, let us know!
The other pain point we’ve heard consistently in our conversations is that the proliferation of ML infrastructure has made it difficult (or impossible) for teams to know what ML code is running where, who’s responsible for it, and whether it’s working as expected.
From the beginning, Aqueduct’s been focused on gathering and centralizing the metadata associated with ML pipelines, and over the last few months, we’ve made significant improvements in how this metadata is presented.
Aqueduct v0.1.6 added a new metadata view that shows what workflows are running, where they’re running, what their status is, and how their associated metrics are performing at a glance:
The same visibility is available for the models and datasets created by your workflows. These views are searchable and sortable. In the coming releases, we’ll be adding support for filtering these views and saving view configurations, so you can quickly find the metadata your care about. Our goal is to make it easy for you to find the workflows, models, and data you care about, have confidence that they’re behaving as expected, and easily triage when things go awry.
We’re thrilled about the progress we’ve made, and there’s a lot more to come. Here’s a sneak peek:
If you’re interested in learning more, try out the open-source project, or join our Slack community to say hi — we’d love to hear from you!
We'd love to hear from you! Star us on GitHub, join our community, or start a discussion.
© 2023 Aqueduct, Inc. All rights reserved.