Like most of us, I spent last weekend thinking about the past, present, and future of AI. It's hard to imagine the industry without OpenAI — an institution we all look to, respect, and rely on — at its forefront. I for one am rooting for the company and brilliant folks who work there to continue leading the way. However, I also realized that the AI ecosystem benefits from being interoperable and default open and that we at Braintrust have an important role to play in that.
For months, a few key challenges in AI development have been nagging me:
This weekend felt like the perfect time to address these challenges. I'm very excited to announce the newest feature of Braintrust: an AI proxy. The proxy addresses the above pain points by embracing OpenAI's interface as the lingua franca for LLMs, and adding caching, logging, and API key management behind the scenes. It also supports popular open source models like LLaMa 2 and Mistral via Perplexity and all of OpenAI's and Anthropic's models.
If you have something built on GPT-4 or another model, you can now try it out on LLaMa2, Mistral, Anthropic, or others — and vice versa — without changing any code. We believe this is just the start and that the AI proxy will enable our current and future customers to build robust, low latency systems that work across a thriving and open ecosystem of model providers.
Before we get into the details, here's a quick demo + instructions to try it out. You can use your favorite OpenAI drivers, and
set the base url to https://api.braintrust.dev/v1/proxy. Try running the following script in your favorite language, twice.
const client = new OpenAI({
baseURL: "https://api.braintrust.dev/v1/proxy",
});
async function main() {
const start = performance.now();
const response = await client.chat.completions.create({
model: "gpt-3.5-turbo",
messages: [{ role: "user", content: "What is a proxy?" }],
seed: 1, // A seed activates the proxy's cache
});
console.log(response.choices[0].message.content);
console.log(`Took ${(performance.now() - start) / 1000}s`);
}
main();
If you have access to Perplexity or Anthropic, feel free to use their API keys with mistral-7b-instruct or
claude-instant-1.2 instead. Under the hood, we're proxying the requests, caching the results with end-to-end
encryption, and streaming back the results.
Read on to learn more technical details, or check out the docs.
The AI proxy is optimized for a few key objectives: low latency, security, and portability:
claude-2 for gpt-4. We'll figure out a way
to make this scale — via open source or other forms of open collaboration.The feature I'm personally most excited by is the cache. When I'm writing code, I like to tinker and re-run things very often. For example, I was recently working on an LLM-based list-of-strings comparator and tweaked the threshold at which I use an LLM to compare strings. Each time I tweaked this threshold, a majority of the string comparisons were the same, but I'd waste minutes (!) waiting for GPT-4 to recompute the same results.
The AI proxy solves this problem by caching model calls for you, both ordinary and streaming. By default,
requests with temperature=0 or the new seed parameter,
are cached. You can also set the x-bt-use-cache to always or never to more directly control this behavior.
Because the cached values are encrypted in terms of your API key, the cache is not shared across users. Braintrust customers can choose to share cached values across users in their organization.
You can use your OpenAI, Anthropic, and Perplexity API keys to access their respective models. However, if you're a Braintrust user, you can create a single API key that will work across services (even on the free plan). Create a Braintrust account and enter a value for each service you'd like to use:

Then, pass your Braintrust API key into the SDK instead of a provider-specific one. When you issue a request, the proxy will translate your Braintrust API key into the appropriate secret behind the scenes. We believe this pattern is very powerful — you can configure how you want the proxy to behave behind the scenes without changing a single line of code — and are already working on features like deeper integration within Braintrust's evaluation and logging tools, load balancing, and model routing.
The AI proxy is available for all to use, for free, as a beta. You get a common interface across providers and caching out of the box, and if you create a Braintrust account, you can configure a single API key to work across OpenAI, Anthropic, and Perplexity, as well as a number of other powerful features like [evaluations](https://www.braintrustdata.com/docs/evaluate, logging, and more.
We hope it's straightforward enough to use that you'll always just use https://api.braintrust.dev/v1/proxy as a default base
url. We are also interested in expanding the proxy's features, providers, and of course fixing bugs and
improving performance. Let us know your thoughts by email or on
discord.