Documentation Index
Fetch the complete documentation index at: https://docs.gaussia.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
All conversation data in Gaussia is represented using two Pydantic models:Dataset for sessions and Batch for individual interactions.
Dataset
ADataset represents one complete conversation session between a user and an assistant.
Fields
| Field | Type | Description |
|---|---|---|
session_id | str | Unique identifier for the conversation session |
assistant_id | str | Identifier for the AI assistant being evaluated |
language | str | None | Language of the conversation (default: "english") |
context | str | Background context provided to the assistant |
conversation | list[Batch] | Ordered list of interactions in this session |
Batch
ABatch represents a single question–answer interaction.
Fields
| Field | Type | Default | Description |
|---|---|---|---|
qa_id | str | required | Unique identifier for this interaction |
query | str | required | The user’s question or input |
assistant | str | required | The assistant’s actual response |
ground_truth_assistant | str | required | The expected or reference response |
observation | str | None | None | Additional notes about the interaction |
weight | float | None | None | Importance weight for aggregation (must be ≥ 0) |
agentic | dict | None | {} | Tool usage metadata (for the Agentic metric) |
ground_truth_agentic | dict | None | {} | Expected tool usage (for the Agentic metric) |
logprobs | dict | None | {} | Token log probabilities |
Streamed batch
For stream-based processing (STREAM_BATCHES), individual interactions are wrapped in StreamedBatch:
Weighting
Theweight field on Batch controls how much each interaction contributes to the aggregated score:
- No weights set: Equal weight (
1/n) for all interactions - All weights set: Must sum to 1.0, otherwise Gaussia falls back to equal weights
- Partial weights: Remaining budget is distributed equally among unweighted interactions
The
observation field is used by some metrics (Context, Conversational) as an alternative to ground_truth_assistant. When present, the judge prompt is adjusted to evaluate against the observation rather than the ground truth.