Skip to main content

Overview

All conversation data in Gaussia is represented using two Pydantic models: Dataset for sessions and Batch for individual interactions.

Dataset

A Dataset represents one complete conversation session between a user and an assistant.
from gaussia.schemas.common import Dataset, Batch

dataset = Dataset(
    session_id="session-001",
    assistant_id="assistant-v2",
    language="english",
    context="Product documentation for the Acme Widget.",
    conversation=[
        Batch(
            qa_id="q1",
            query="How do I install the widget?",
            assistant="Run pip install acme-widget.",
            ground_truth_assistant="Install with: pip install acme-widget",
        ),
    ],
)

Fields

FieldTypeDescription
session_idstrUnique identifier for the conversation session
assistant_idstrIdentifier for the AI assistant being evaluated
languagestr | NoneLanguage of the conversation (default: "english")
contextstrBackground context provided to the assistant
conversationlist[Batch]Ordered list of interactions in this session

Batch

A Batch represents a single question–answer interaction.
batch = Batch(
    qa_id="q1",
    query="What is the return policy?",
    assistant="You can return items within 30 days.",
    ground_truth_assistant="Items can be returned within 30 days of purchase.",
    observation="The assistant correctly identified the return window.",
    weight=0.5,
)

Fields

FieldTypeDefaultDescription
qa_idstrrequiredUnique identifier for this interaction
querystrrequiredThe user’s question or input
assistantstrrequiredThe assistant’s actual response
ground_truth_assistantstrrequiredThe expected or reference response
observationstr | NoneNoneAdditional notes about the interaction
weightfloat | NoneNoneImportance weight for aggregation (must be ≥ 0)
agenticdict | None{}Tool usage metadata (for the Agentic metric)
ground_truth_agenticdict | None{}Expected tool usage (for the Agentic metric)
logprobsdict | None{}Token log probabilities

Streamed batch

For stream-based processing (STREAM_BATCHES), individual interactions are wrapped in StreamedBatch:
from gaussia.schemas.common import StreamedBatch, SessionMetadata

streamed = StreamedBatch(
    metadata=SessionMetadata(
        session_id="session-001",
        assistant_id="assistant-v2",
        language="english",
        context="Product documentation.",
    ),
    batch=Batch(
        qa_id="q1",
        query="How do I install?",
        assistant="Run pip install.",
        ground_truth_assistant="Install with pip install.",
    ),
)

Weighting

The weight field on Batch controls how much each interaction contributes to the aggregated score:
  • No weights set: Equal weight (1/n) for all interactions
  • All weights set: Must sum to 1.0, otherwise Gaussia falls back to equal weights
  • Partial weights: Remaining budget is distributed equally among unweighted interactions
conversation = [
    Batch(qa_id="q1", weight=0.6, ...),  # Critical question
    Batch(qa_id="q2", weight=0.4, ...),  # Less important
]
The observation field is used by some metrics (Context, Conversational) as an alternative to ground_truth_assistant. When present, the judge prompt is adjusted to evaluate against the observation rather than the ground truth.