Latest

Last Week at Maxim: Week 1 of May

Last Week at Maxim: Week 1 of May

We're back with another round of powerful updates to help you build, test, and observe AI agents more effectively. Here's what we rolled out: Agent Mode in Prompt Playground You can now simulate full agentic behavior in the playground and test runs, enabling auto tool calling

SuperBPE: Rethinking Tokenization for Language Models

SuperBPE: Rethinking Tokenization for Language Models

In the domain of language models, tokenization i.e. the process of breaking down text into manageable units plays a pivotal role. Traditionally, models rely on subword tokenization, where words are split into smaller units. However, this approach often overlooks the semantic significance of multi-word expressions and varies across languages.

Maxim x Clin

Elevating Conversational Banking: Clinc's Path to AI Confidence with Maxim

About Clinc Clinc provides a sophisticated conversational AI platform, primarily focused on the banking industry, enabling financial institutions to build advanced virtual assistants. Recognizing that state-of-the-art AI results are increasingly obtained by orchestrating multiple components into Compound AI Systems, rather than relying on single large models, Clinc is specifically designed

Scaling Enterprise Support: Atomicwork's Journey to Seamless AI quality with Maxim

Scaling Enterprise Support: Atomicwork's Journey to Seamless AI quality with Maxim

About Atomicwork Atomicwork is an agentic service management platform that helps businesses automate IT, HR, and workplace support, enabling employees to solve issues faster, work smarter, and stay productive. Built AI-native, Atomicwork combines intelligent agents, adaptive workflows, and enterprise-grade governance to deliver proactive support right in the flow of work.

Can Your AI Explain Why It’s Moral?

Can Your AI Explain Why It’s Moral?

Large‑language models (LLMs) already draft contracts, triage medical claims, and screen résumés. Every one of those tasks is laced with ethical choices, yet most benchmarks still judge models on math puzzles or multiple‑choice trivia. The authors of the paper “Auditing the Ethical Logic of Generative AI Models” step

Evaluating the Quality of AI HR Assistants using Maxim AI

Evaluating the Quality of AI HR Assistants using Maxim AI

Use of Artificial Intelligence in Human Resources is reducing administrative load by automating routine tasks such as hiring and resolving employee queries, freeing HR teams to focus on people-centric initiatives. The applications of AI span every stage of the HR workflows, including: * Sourcing candidates from large talent pools * Screening thousands

Last Week at Maxim (Weekly Updates)

Last Week at Maxim (Weekly Updates)

This week at Maxim, we’ve rolled out some powerful updates focused on dynamic scripting, enhanced LLM flexibility, and deeper error visibility across our SDKs and libraries. Script-Driven Variables in Evaluations Evaluations just got a whole lot smarter! All variables defined via custom scripts are now surfaced directly in workflow

Introduction to the Agent2Agent Protocol (A2A)

Introduction to the Agent2Agent Protocol (A2A)

Communication protocols play a crucial role in enabling seamless interactions between different systems. Google's recently published Agent to Agent Protocol (A2A) represents a significant advancement in this space, designed specifically to facilitate collaborative scenarios between autonomous agents. What is A2A? The protocol is an open specification that enables

Skipping the "Thinking": How Simple Prompts Can Outperform Complex Reasoning in AI

Skipping the "Thinking": How Simple Prompts Can Outperform Complex Reasoning in AI

Introduction In the realm of AI, especially with large language models, the prevailing belief is that complex reasoning tasks require detailed, step-by-step "thinking" processes. These processes often involve generating extensive intermediate steps before arriving at a solution, consuming significant computational resources. However, the paper titled "Reasoning Models

Evaluating the Quality of Healthcare Assistants using Maxim AI

Evaluating the Quality of Healthcare Assistants using Maxim AI

Introduction Healthcare assistants are changing the way patients and clinicians interact. For patients, these tools offer easy access to timely advice and guidance, improving overall care and satisfaction. For clinicians, they reduce administrative tasks, allowing more time for patient care while providing real-time knowledge and data to support informed decision-making.

Can We Trust What AI Models Say They're Thinking? A Deep Dive into Chain-of-Thought Faithfulness

Can We Trust What AI Models Say They're Thinking? A Deep Dive into Chain-of-Thought Faithfulness

Chain-of-Thought (CoT) based reasoning has exploded across the AI landscape. Modern large language models (LLMs) like Claude 3.7 Sonnet and DeepSeek R1 no longer just give answers but also generate natural language explanations that walk through their decision-making process. This transparency isn’t just about UX but it has

The Era of Experience: Vision for the Next Frontier in AI

The Era of Experience: Vision for the Next Frontier in AI

In this recent paper, The Era of Experience, David Silver and Richard Sutton articulate a vision for artificial intelligence where there is a shift from reliance on static, human-generated data to dynamic, self-generated experiential learning. This paradigm aims to propel current models beyond their limitations, developing systems capable of continuous

Evaluating the Quality of Clinical Documentation Using Maxim AI

Evaluating the Quality of Clinical Documentation Using Maxim AI

Medical note-taking and scribing assistants have eased the administrative burden on clinicians by ambiently generating clinical notes, allowing them to spend less time documenting and more time with patients. In high-stakes clinical environments, it is essential to generate clear, accurate, and safe notes based on multi-turn interactions between clinicians and

Building Robust Evaluation Workflows for AI Agents

Building Robust Evaluation Workflows for AI Agents

Through the first two blogs (Part 1 and Part 2) of the AI agent evaluation series, we explored AI agents and the key performance metrics for evaluating them. Now, we focus on building end-to-end evaluation workflows. A structured AI evaluation process encompassing both pre-release and post-release phases is crucial for

Mastering the Art of Prompt Engineering: A Practical Guide for Better AI Outcomes

Mastering the Art of Prompt Engineering: A Practical Guide for Better AI Outcomes

Prompt engineering might just be one of the most accessible yet impactful skills in artificial intelligence today. At its core, prompt engineering involves crafting specific inputs that guide Large Language Models (LLMs) to produce precise and accurate outputs, essential for ensuring AI quality. You don't have to be