@gaussia/sdk/generators subpath turns context documents into validated Dataset[] you can evaluate or optimize against.
A BaseGenerator runs a template-method pipeline: load the source into chunks, select chunk groups (one group becomes one Dataset), call your LanguageModel per chunk for structured output, then map that output into validated Dataset and Batch objects.
Generate a dataset
Construct a generator with a model
The generator talks only to the
LanguageModel interface — bring any adapter or your own implementation.Generate from a source
Dataset[], ready to hand to a retriever, an evaluator, or the prompt optimizer.Request options
generateDataset takes one request object:
Turns
source into chunks. Use StringContextLoader (isomorphic) or the Node-only Markdown loader.The context to generate from. A single string or an array; the loader decides how it becomes chunks.
The assistant id stamped onto every generated
Dataset.Queries generated per chunk in single-turn mode, or turns per conversation in conversation mode.
Language the model is asked to generate in.
Few-shot examples to steer style and difficulty. Included in the generation prompt.
Replaces the default generation system prompt when you need full control.
How chunks are grouped into datasets. Defaults to
SequentialStrategy.Generate multi-turn conversations instead of independent single-turn queries.
Cancel in-flight generation.
Context loaders
A loader turns asource into a list of Chunks. Loaders are interchangeable at the call site because they share the string | string[] source type.
- StringContextLoader (isomorphic)
- LocalMarkdownLoader (Node)
Treats each input string as one pre-chunked unit: one string becomes one chunk, an array becomes one chunk per element. No Node built-ins, so it runs in the browser.
ContextLoader yourself — any object with loadChunks(source): Promise<Chunk[]> plugs in.
Selection strategies
A strategy decides how chunks become datasets. Pass one asselectionStrategy.
- SequentialStrategy (default)
- RandomSamplingStrategy
- Custom strategy
Groups every chunk into a single dataset containing queries from all chunks.
Single-turn and multi-turn
Steering the output
UseseedExamples for few-shot guidance, or customSystemPrompt to replace the generation prompt entirely.
DEFAULT_SYSTEM_PROMPT, DEFAULT_CONVERSATION_PROMPT, fillTemplate, buildSeedExamplesSection) if you want to build on them. See Schemas for the GeneratedQueriesOutput and GeneratedConversationOutput shapes.
Runtime
The generators bundle pulls in zero AI-SDK bytes, and the default (browser) bundle imports no Node-only built-ins. You bring aLanguageModel; the generators add no vendor dependency of their own. All model access flows through generateObject for validated structured output — there is no free-text JSON fallback.
Next steps
Optimize a prompt against generated data
Feed a generated
Dataset[] straight into the GEPA optimizer.