Vikram Sreekanti
As we’ve been building Aqueduct, we’ve spoken to over 200 data & ML teams. The question we’ve asked everyone we’ve spoken to is how they use the machine learning models they build.
The popular perception is that models are deployed as REST endpoints, and predictions are made by querying those endpoints. We spent years at Berkeley working on making this process faster, more cost-efficient, and easier to deploy.
For some use cases, this is a great solution — when predictions cannot be precomputed (often because the space of possible inputs is too large), it makes sense to generate them when needed. Common examples are algorithmic feed generation, activity-based recommendation, and real-time speech recognition.
However, most of the data teams we’ve spoken to don’t have these types of problems. Instead, they build models that make predictions on fixed input data that’s updated on a relatively long timescale — days, weeks, or even occasionally months. The predictions are correspondingly updated on the same timescale as the data.
The list of examples for periodically updated predictions is much larger — we’ve come across everything from churn prediction and lead scoring for business teams to route planning, credit risk scoring, and carbon offset management.
Deploying a REST endpoint for each of these models is not only unnecessary, it’s unwise. The time required to set up real-time prediction infrastructure, the ongoing maintenance burden, and the monetary cost of having a long-running service are all prohibitively high. Deploying an offline, batch job is complicated enough — why go out of your way to make things even harder?
We’ve found that most of the teams with batch-style use cases for their models simply publish their predictions as… more data. That data can be published in a number of places: databases or data warehouses, business systems like Salesforce or Google Sheets, and even as reports shared via Slack or email.
In addition to reducing complexity, publishing predictions as data has a number of other benefits:
It’s become clear that publishing predictions as dataset is a much smarter, more effective solution than defaulting to publishing a REST endpoint.
But while deploying batch prediction pipelines is easier than deploying REST endpoints, it’s certainly not easy. Most of the teams we’ve talked to have strung together a combination of orchestration tools (e.g., Airflow), Kubernetes cluster, and custom code to publish predictions. We’ve been calling this StackOverflow Infrastructure.
That’s one of the core challenges we’re solving with Aqueduct. If you’re interested in learning more, join our community or say hello!
© 2023 Aqueduct, Inc. All rights reserved.