Documentation Index
Fetch the complete documentation index at: https://docs.gaussia.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Several Gaussia metrics (Context, Conversational, BestOf, Agentic) use an LLM-as-a-Judge pattern to evaluate AI responses. TheJudge class handles prompt rendering, model invocation, and response parsing.
How it works
TheJudge supports two evaluation modes:
| Mode | How it works | Best for |
|---|---|---|
| Structured output | Uses LangChain’s create_agent with response_format for schema-validated responses | Models that support structured outputs (GPT-4o, Gemini) |
| Regex extraction | Embeds JSON schema in the prompt, extracts from markdown code blocks | Any model, including open-source |
Configuration
You configure the judge through the metric’s constructor parameters:Parameters
| Parameter | Default | Description |
|---|---|---|
model | required | Any LangChain BaseChatModel instance |
use_structured_output | False | Use schema-validated structured output |
strict | True | Enforce strict schema validation |
bos_json_clause | ```json | Opening marker for JSON extraction (regex mode only) |
eos_json_clause | ``` | Closing marker for JSON extraction (regex mode only) |
Compatible models
TheJudge works with any LangChain-compatible chat model:
Reasoning extraction
When available, the Judge automatically extracts reasoning content from the model’s response. This is supported by models that provide chain-of-thought reasoning (e.g., OpenAI’s reasoning models, Anthropic’s extended thinking). The reasoning is returned as the first element of the tuple fromjudge.check() and is used internally for logging and debugging.