Skip to main content
The Gaussia base class is the entry point for evaluation. It defines the processing flow and delegates the per-batch work to your subclass. This is the Template Method pattern: the base class owns the lifecycle, and you implement one method.

The batch method

Extend Gaussia and implement the abstract batch method. The base class calls it once per unit of work and collects whatever you push to this.metrics.
import { Gaussia, type BatchInput } from "@gaussia/sdk";

class CountBatches extends Gaussia<number> {
  protected async batch({ sessionId, batch }: BatchInput) {
    this.metrics.push(batch.length);
  }
}
The type parameter on Gaussia<number> is the metric type collected in this.metrics and returned by run.

Batch input

Each call to batch receives:
sessionId
string
Identifier of the session being evaluated.
context
string
The session context (for example, the source document or system context).
assistantId
string
Identifier of the assistant under evaluation.
batch
Batch[]
The conversation batches for this unit of work.
language
string | null
The session language, or null when unspecified.

Running an evaluation

Call the static run method with your retriever class and its config. The base class constructs the retriever, loads the dataset, processes it, and returns the collected metrics.
const totals = await CountBatches.run(MyRetriever, retrieverConfig);
run accepts an optional third argument for options such as a Logger.

Iteration levels

The retriever’s iterationLevel controls how the dataset is consumed:
  • full_dataset (default): loadDataset returns Dataset[], processed in order.
  • stream_sessions: loadDataset returns an async iterable of Dataset, processed as they arrive.
  • stream_batches: loadDataset returns an async iterable of StreamedBatch, processed one batch at a time.
If the retriever returns an async iterable while the level is full_dataset, the constructor throws a RetrieverError. Set a streaming level when you stream.

Hooks

  • onProcessComplete runs after all batches are processed. Override it to finalize aggregate metrics. The default is a no-op.
  • resolveWeights normalizes per-batch weights to sum to 1, falling back to uniform weights (with a logged warning) when explicit weights are missing or inconsistent.

Schemas

Dataset, Batch, and the rest of the data model.

Language models

Call a model from inside your batch implementation.