✨ Maxim AI February 2025 Update

✨ Maxim AI February 2025 Update

Feature spotlight

🚀 Agent simulation

Teams today spend countless hours manually going back and forth with their agents to assess quality and identify failure modes. And it’s not a one-time thing— unintended regressions happen all the time, whether due to model updates or optimizations based on user feedback.

Introducing Maxim’s agent simulation! Now, you can evaluate your AI agents in just three simple steps:

1️⃣ Define the real-world scenarios, user personas, and any other context you want to test your agents on.
2️⃣ Pick the right evaluators, from predefined evaluators to custom metrics or human reviews, for your use case and trigger a test run.
3️⃣ Analyze your agent’s performance, debug issues, and iterate.

The best part? Simply bring your agents via an API endpoint and get started in minutes! We're launching this feature on Product Hunt—click "Notify me" to stay updated: Link.

🔑 2FA is now enabled for all plans

Teams store sensitive customer data, API keys, and production logs on the Maxim platform—and we take that trust seriously. To enhance account security, Maxim now supports two-factor authentication (2FA), adding an extra layer of protection against unauthorized access to your data.

Enable 2FA for all users by simply navigating to Settings ➡️ Organization info ➡️ Two-factor authentication.

🧩 Ollama models are supported on Maxim!

Maxim now supports over 6500 models through Ollama, enabling you to test prompts and workflows on your local machine with ease. Key benefits:

  • Local testing: Run evaluations locally using Ollama with enhanced data privacy without relying on cloud uploads.
  • Easy setup: Quickly enable the Ollama provider and use the model of your choice by going to Settings ➡️ Models ➡️ Ollama. Learn more.
  • Open WebUI support: Run and interact with LLMs entirely offline or within your privately hosted environment

🧠 Claude 3.7 Sonnet is live on Maxim!

Claude 3.7 Sonnet— Anthropic's latest reasoning-focused model is now available on Maxim. Design custom evaluators and prompt experiments leveraging the analytical capabilities of this model.

Start using this model on Maxim via the Anthropic provider:
✅ Go to Settings > Models > Anthropic and add Claude 3.7 Sonnet.

💬 In-app support for customers

Need help? With our in-app support, you can get assistance, ask questions, and share feedback — all with a click of a button. With 24/7 support, we’re here to ensure you’re never blocked and can stay focused on building, testing, and scaling your AI workflows.

Upcoming release

🔗 Prompt chains: Now more powerful

We have revamped our prompt chains feature, making it easier for you to prototype all possible steps in your complex agentic workflows. Users can now create parallel chains to execute concurrent or conditional tasks, perform more flexible data transfer between nodes, and gain improved visibility into the functioning of each block through our refined user interface.

⚡Bifrost: One interface, any LLM!

We’re open-sourcing Bifrost — the unified LLM gateway that we have built to power all LLM communication for Maxim. Implemented purely with Go, Bifrost features a robust plugin system that lets you add any layer like RAG, guardrails, and more without sweat. With 100B+ tokens processed across 100+ models and 7+ providers, Bifrost is the backbone of our infrastructure, and we’re thrilled to share it with the community.

Knowledge nuggets

🏗️ Architecting a stateless tracing SDK for GenAI

GenAI systems challenge traditional observability by introducing unpredictability and semantic variability that conventional tools can't capture. Maxim addresses this gap with a fully stateless, OpenTelemetry-compatible tracing system tailored for GenAI applications.

Our solution ensures end-to-end visibility by capturing every interaction—spanning LLM calls, function chains, and backend components—without requiring storage or state. This approach enables early detection of embedding drift, hallucinations, and performance degradation, transforming observability into an operational necessity for maintaining system reliability. Read more in our blog.