Overview
The Generators module creates syntheticDataset objects from context documents. This is useful for bootstrapping evaluations when you don’t have real conversation data.
How it works
- A context loader reads and chunks your documents
- A chunk selection strategy picks which chunks to process
- The generator uses an LLM to create realistic QA pairs from each chunk
Usage
Context loaders
LocalMarkdownLoader
Reads markdown files and splits them into chunks based on headers and size:Custom loader
ImplementBaseContextLoader for custom document sources:
Chunk selection strategies
| Strategy | Description |
|---|---|
SequentialStrategy | Process all chunks in order (default) |
RandomSamplingStrategy | Randomly sample chunks multiple times |