The Aqueduct logo.

Product

PricingTeamCareers
2
BlogDocs

The easiest way to run open LLMs

Vikram Sreekanti

Vikram Sreekanti

While LLMs have generated significant discussion in recent months, commercial use continues to be limited. We’ve heard from ML teams at companies large and small that using hosted, private LLMs like GPT is a non-starter due to concerns around data privacy, regulatory compliance, IP ownership, and cost.

Open-source LLMs like LLaMa, Vicuna, and Dolly have enabled teams to consider using LLMs for internal business applications, but they’re incredibly difficult to operate. At Aqueduct, our goal is to enable teams to do machine learning in the cloud without dealing with operational nightmares.

Today, we’re incredibly excited to share that as of Aqueduct v0.3, you can run open-source LLMs in the cloud with a single API call. Aqueduct is now the easiest way to run open LLMs in your cloud

Our LLM support combines seamlessly with our existing support for running machine learning tasks on your existing cloud infrastructure. As of this release, we’ve added:

  • a new llm_op API that allows you to run an LLMs Aqueduct with no extra cloud configuration
  • a new aqueduct-llm library that allows you to write custom operators in Aqueduct that use one or more open-source LLMs
    • aqueduct-llm is a standalone pip package that you can run on a server with a GPU as well
  • support for open-source versions of LLaMa, Dolly, and Vicuna, with StableLM and Alpaca to follow soon*
    • automatic resource allocation optimized on a per-model basis**
  • the ability to track LLM-specific metadata such as prompts and model parameters in your LLM-powered pipeline

You can see the full documentation here. Running an LLM with Aqueduct is now this easy:

import aqueduct as aq
client = aq.Client()

vicuna = aq.llm_op(name="vicuna_7b", engine="k8s-us-east-2")

generated_text = vicuna("What is the best LLM?")
generated_text.get()
>>> There is no definitive answer to this question, as "best" is subjective 
>>> and depends on the specific use case. However, some of the most popular 
>>> large language models include GPT-3, BERT, and XLNet.

You can even process a full dataset with an LLM:

vicuna = aq.llm_op(
	name="vicuna_7b",
	engine="k8s-us-east-2",
	column_name="review",
	output_column_name="response"
)

db = client.resource("snowflake")
hotel_reviews = db.sql("SELECT * FROM hotel_reviews;")

responses = vicuna(
	hotel_reviews,
	{
		"prompt": """
			Given the following hotel review, generate the appropriate response from
			customer service: 
	
			{text}
		"""
	},
)

If you’re interested in learning more, check out our documentation, try it out, join our Slack community. We have a lot more LLM features planned, so please reach out if you have ideas!


*Please note that many of these models — Vicuna, Alpaca, and LLaMa in particular — are released under licenses that do not allow for commercial use.

**Depending on which models you use, resource requirements will vary. You can use the resources argument to customize your resources or have Aqueduct automate it. All models require GPU support; we recommend running them on a large server with GPUs available or on Kubernetes.


Aqueduct

Why AqueductOpen SourceDocumentationResources

Company

AboutBlogCareers
2

Try Aqueduct today

See how Aqueduct can help untangle the MLOps Knot.

The Aqueduct logo.

© 2023 Aqueduct, Inc. All rights reserved.