Chenggang Wu
Vikram Sreekanti
Yesterday, the team at Databricks published Dolly v2, which they say is the first truly open-source and commercially viable LLM. As we discussed last week, we’re super excited about the potential of open LLMs.
Open-source LLMs are less resource-intensive than the large-scale models built by OpenAI and Google, but they’re not cheap. Dolly requires ~60GB of RAM and a “relatively powerful GPU” (an Nvidia A10) for inference purposes — most of us probably don’t have those kinds resources on our Macs.
That bring us back to running cloud infrastructure to do machine learning, which is a complex mess. We came across a kindred soul on HackerNews this morning who tried something similar and ended up waiting 12 minutes for a single inference.
That’s where Aqueduct comes in. With a couple line of Python, you can write and invoke a function that uses Dolly, running on a GPU on Kubernetes and with all the required resources:
import aqueduct as aq
client = aq.Client()
k8s_integration_name = "eks-us-east-2" # <- FILL ME IN
@aq.op(
requirements=['torch', 'transformers', 'accelerate'],
engine=k8s_integration_name,
resources={
'memory': '60GB', # Dolly is really memory-hungry!
'gpu_resource_name': 'nvidia.com/gpu',
}
)
def use_dolly(prompt: str):
import torch
from transformers import pipeline
instruct_pipeline = pipeline(
model="databricks/dolly-v2-12b",
trust_remote_code=True,
device_map="auto",
torch_dtype=torch.bfloat16,
)
return instruct_pipeline(prompt)
use_dolly('What is the best LLM?').get()
> The best LLM is Neutral Little Mac. It is the most balanced meal plan
> out of all the low-carb diets. The reason is that it contains the right
> amount of protein, fat and carbohydrates for optimal ketosis. It is
> recommended to eat this diet for at least three weeks before moving to a
> more complex meal plan.
# 😬 It looks like our prompt engineering or Dolly itself
# leaves something to be desired for this particular question.
With the @op
decorator, you can run Dolly in your cloud seamlessly. Aqueduct will automatically package up your code in a Docker container, install the correct CUDA drivers, and run your function on a GPU in Kubernetes. (In fact, you can even have Aqueduct create & manage a Kubernetes cluster for you!)
Now that we have a model running, connecting it to your data with Aqueduct is simple:
db = client.integration('my-snowflake-db') # Any DB connected to Aqueduct.
data = db.sql('SELECT * FROM reviews_data;')
use_dolly(data)
The rate of evolution of LLM technology in recent months has been shockingly fast. With Dolly v2, these capabilities can be used on your private data, but unfortunately, the getting models running in the cloud is still a nightmare.
That’s where we’re focused on solving with Aqueduct. We’re adding one LOC configuration for popular LLMs and on a number of other features to make this process easier. If you’re interested in running LLMs, please join our community Slack or open an issue on GitHub!
© 2023 Aqueduct, Inc. All rights reserved.