Skip to main content
Hand-writing evaluation datasets is slow and biased toward the cases you already thought of. The @gaussia/sdk/generators subpath turns context documents into validated Dataset[] you can evaluate or optimize against. A BaseGenerator runs a template-method pipeline: load the source into chunks, select chunk groups (one group becomes one Dataset), call your LanguageModel per chunk for structured output, then map that output into validated Dataset and Batch objects.

Generate a dataset

1

Construct a generator with a model

The generator talks only to the LanguageModel interface — bring any adapter or your own implementation.
import { BaseGenerator } from "@gaussia/sdk/generators";
import { createAiSdkAdapter } from "@gaussia/sdk/adapters/ai-sdk";
import { openai } from "@ai-sdk/openai";

const model = createAiSdkAdapter(openai("gpt-4o-mini"));
const generator = new BaseGenerator({ model });
2

Generate from a source

import { StringContextLoader } from "@gaussia/sdk/generators";

const datasets = await generator.generateDataset({
  contextLoader: new StringContextLoader(),
  source: [
    "Northwind refunds are issued to the original payment method within 5 business days. A refund requires the order ID.",
    "Enable two-factor authentication from Settings → Security. Support never asks for your password.",
  ],
  assistantId: "support-bot",
  numQueriesPerChunk: 3,
});
The result is a validated Dataset[], ready to hand to a retriever, an evaluator, or the prompt optimizer.

Request options

generateDataset takes one request object:
contextLoader
ContextLoader
required
Turns source into chunks. Use StringContextLoader (isomorphic) or the Node-only Markdown loader.
source
string | string[]
required
The context to generate from. A single string or an array; the loader decides how it becomes chunks.
assistantId
string
required
The assistant id stamped onto every generated Dataset.
numQueriesPerChunk
number
default:"3"
Queries generated per chunk in single-turn mode, or turns per conversation in conversation mode.
language
string
default:"english"
Language the model is asked to generate in.
seedExamples
string[]
Few-shot examples to steer style and difficulty. Included in the generation prompt.
customSystemPrompt
string
Replaces the default generation system prompt when you need full control.
selectionStrategy
ChunkSelectionStrategy
How chunks are grouped into datasets. Defaults to SequentialStrategy.
conversationMode
boolean
default:"false"
Generate multi-turn conversations instead of independent single-turn queries.
signal
AbortSignal
Cancel in-flight generation.

Context loaders

A loader turns a source into a list of Chunks. Loaders are interchangeable at the call site because they share the string | string[] source type.
Treats each input string as one pre-chunked unit: one string becomes one chunk, an array becomes one chunk per element. No Node built-ins, so it runs in the browser.
import { StringContextLoader } from "@gaussia/sdk/generators";

const loader = new StringContextLoader({ idPrefix: "kb" }); // idPrefix default "string"
You can also implement ContextLoader yourself — any object with loadChunks(source): Promise<Chunk[]> plugs in.

Selection strategies

A strategy decides how chunks become datasets. Pass one as selectionStrategy.
Groups every chunk into a single dataset containing queries from all chunks.
import { SequentialStrategy } from "@gaussia/sdk/generators";

selectionStrategy: new SequentialStrategy();

Single-turn and multi-turn

// numQueriesPerChunk independent query/answer pairs per chunk.
// Output follows GeneratedQueriesOutput.
const datasets = await generator.generateDataset({
  contextLoader: new StringContextLoader(),
  source: knowledgeBase,
  assistantId: "support-bot",
  numQueriesPerChunk: 3,
});

Steering the output

Use seedExamples for few-shot guidance, or customSystemPrompt to replace the generation prompt entirely.
const datasets = await generator.generateDataset({
  contextLoader: new StringContextLoader(),
  source: knowledgeBase,
  assistantId: "support-bot",
  seedExamples: [
    "Q: How long does a refund take? A: Refunds reach your original payment method within 5 business days.",
  ],
  // customSystemPrompt: "You write terse, factual support questions...",
});
The default prompts are also exported (DEFAULT_SYSTEM_PROMPT, DEFAULT_CONVERSATION_PROMPT, fillTemplate, buildSeedExamplesSection) if you want to build on them. See Schemas for the GeneratedQueriesOutput and GeneratedConversationOutput shapes.

Runtime

The generators bundle pulls in zero AI-SDK bytes, and the default (browser) bundle imports no Node-only built-ins. You bring a LanguageModel; the generators add no vendor dependency of their own. All model access flows through generateObject for validated structured output — there is no free-text JSON fallback.

Next steps

Optimize a prompt against generated data

Feed a generated Dataset[] straight into the GEPA optimizer.