Datasets and batches

Overview

All conversation data in Gaussia is represented using two Pydantic models: Dataset for sessions and Batch for individual interactions.

Dataset

A Dataset represents one complete conversation session between a user and an assistant.

from gaussia.schemas.common import Dataset, Batch

dataset = Dataset(
    session_id="session-001",
    assistant_id="assistant-v2",
    language="english",
    context="Product documentation for the Acme Widget.",
    conversation=[
        Batch(
            qa_id="q1",
            query="How do I install the widget?",
            assistant="Run pip install acme-widget.",
            ground_truth_assistant="Install with: pip install acme-widget",
        ),
    ],
)

Fields

Field	Type	Description
`session_id`	`str`	Unique identifier for the conversation session
`assistant_id`	`str`	Identifier for the AI assistant being evaluated
`language`	`str \| None`	Language of the conversation (default: `"english"`)
`context`	`str`	Background context provided to the assistant
`conversation`	`list[Batch]`	Ordered list of interactions in this session

Batch

A Batch represents a single question–answer interaction.

batch = Batch(
    qa_id="q1",
    query="What is the return policy?",
    assistant="You can return items within 30 days.",
    ground_truth_assistant="Items can be returned within 30 days of purchase.",
    observation="The assistant correctly identified the return window.",
    weight=0.5,
)

Fields

Field	Type	Default	Description
`qa_id`	`str`	required	Unique identifier for this interaction
`query`	`str`	required	The user’s question or input
`assistant`	`str`	required	The assistant’s actual response
`ground_truth_assistant`	`str`	required	The expected or reference response
`observation`	`str \| None`	`None`	Additional notes about the interaction
`weight`	`float \| None`	`None`	Importance weight for aggregation (must be ≥ 0)
`agentic`	`dict \| None`	`{}`	Tool usage metadata (for the Agentic metric)
`ground_truth_agentic`	`dict \| None`	`{}`	Expected tool usage (for the Agentic metric)
`logprobs`	`dict \| None`	`{}`	Token log probabilities

Streamed batch

For stream-based processing (STREAM_BATCHES), individual interactions are wrapped in StreamedBatch:

from gaussia.schemas.common import StreamedBatch, SessionMetadata

streamed = StreamedBatch(
    metadata=SessionMetadata(
        session_id="session-001",
        assistant_id="assistant-v2",
        language="english",
        context="Product documentation.",
    ),
    batch=Batch(
        qa_id="q1",
        query="How do I install?",
        assistant="Run pip install.",
        ground_truth_assistant="Install with pip install.",
    ),
)

Weighting

The weight field on Batch controls how much each interaction contributes to the aggregated score:

No weights set: Equal weight (1/n) for all interactions
All weights set: Must sum to 1.0, otherwise Gaussia falls back to equal weights
Partial weights: Remaining budget is distributed equally among unweighted interactions

conversation = [
    Batch(qa_id="q1", weight=0.6, ...),  # Critical question
    Batch(qa_id="q2", weight=0.4, ...),  # Less important
]

The observation field is used by some metrics (Context, Conversational) as an alternative to ground_truth_assistant. When present, the judge prompt is adjusted to evaluate against the observation rather than the ground truth.

​Overview

​Dataset

​Fields

​Batch

​Fields

​Streamed batch

​Weighting

Overview

Dataset

Fields

Batch

Fields

Streamed batch

Weighting