Documentation Index Fetch the complete documentation index at: https://docs.gaussia.ai/llms.txt
Use this file to discover all available pages before exploring further.
Architecture
Gaussia follows a simple yet powerful architecture designed for extensibility and ease of use.
Overview
Data Flow
The core data flow in Gaussia is:
Retriever loads your conversation data (list[Dataset], Iterator[Dataset], or Iterator[StreamedBatch])
Gaussia base class iterates through datasets
Metric implementations process each conversation batch
Results are collected in self.metrics
Load Data
Retriever.load_dataset() returns list[Dataset]
Process Datasets
Gaussia._process() iterates through datasets
Compute Metrics
Metric.batch() processes each conversation
Collect Results
Results stored in self.metrics
Gaussia Base Class
All metrics inherit from Gaussia (gaussia/core/base.py):
from abc import ABC , abstractmethod
from typing import Type
from gaussia.core.retriever import Retriever
class Gaussia ( ABC ):
def __init__ ( self , retriever : Type[Retriever], verbose : bool = False , ** kwargs ):
self .retriever = retriever( ** kwargs)
self .metrics = []
self .verbose = verbose
@abstractmethod
def batch ( self , session_id : str , context : str , assistant_id : str ,
batch : list[Batch], language : str | None ) -> None :
"""Process a batch of conversations. Implemented by each metric."""
pass
@ classmethod
def run ( cls , retriever : Type[Retriever], ** kwargs ) -> list :
"""One-shot execution: instantiate and process."""
instance = cls (retriever, ** kwargs)
instance._process()
return instance.metrics
Retriever
Abstract base class for data loading:
from abc import ABC , abstractmethod
from gaussia.schemas.common import Dataset
class Retriever ( ABC ):
def __init__ ( self , ** kwargs ):
pass
@ property
def iteration_level ( self ) -> IterationLevel:
return IterationLevel. FULL_DATASET # default
@abstractmethod
def load_dataset ( self ) -> list[Dataset] | Iterator[Dataset] | Iterator[StreamedBatch]:
"""Load and return datasets for evaluation."""
pass
Data Structures
Dataset : A complete conversation session
class Dataset ( BaseModel ):
session_id: str # Unique session identifier
assistant_id: str # ID of the assistant being evaluated
language: str | None # Language code (e.g., "english")
context: str # System context/instructions
conversation: list[Batch] # List of Q&A interactions
Batch : A single Q&A interaction
class Batch ( BaseModel ):
qa_id: str # Unique interaction ID
query: str # User question
assistant: str # Assistant response
ground_truth_assistant: str | None # Expected response
observation: str | None # Additional notes
weight: float | None # Importance weight
agentic: dict | None # Tool usage metadata
ground_truth_agentic: dict | None # Expected tool usage
logprobs: dict | None # Log probabilities
Metric Architecture
Each metric follows this pattern:
from gaussia.core.base import Gaussia
class MyMetric ( Gaussia ):
def __init__ ( self , retriever , verbose = False , ** kwargs ):
super (). __init__ (retriever, verbose, ** kwargs)
# Initialize metric-specific components
def batch ( self , session_id , context , assistant_id , batch , language ):
# Process the batch and compute metrics
result = self ._compute(batch)
self .metrics.append(result)
Statistical Modes
Gaussia supports two statistical approaches:
Returns point estimates (floats): from gaussia.statistical import FrequentistMode
metrics = Toxicity.run(
MyRetriever,
statistical_mode = FrequentistMode(),
)
# Returns: metric.group_profiling.frequentist.DIDT = 0.33
Returns full posterior distributions: from gaussia.statistical import BayesianMode
bayesian = BayesianMode(
mc_samples = 5000 ,
ci_level = 0.95 ,
)
metrics = Toxicity.run(
MyRetriever,
statistical_mode = bayesian,
)
# Returns: metric.group_profiling.bayesian.summary['DIDT']
# {mean: 0.17, ci_low: 0.08, ci_high: 0.27}
Module Structure
gaussia/
├── core/
│ ├── base.py # Gaussia base class
│ ├── retriever.py # Retriever abstract class
│ ├── guardian.py # Guardian interface (bias detection)
│ ├── sentiment.py # Sentiment analyzer interface
│ ├── loader.py # Toxicity loader interface
│ └── extractor.py # Group extractor interface
├── metrics/
│ ├── context.py # Context metric
│ ├── conversational.py # Conversational metric
│ ├── toxicity.py # Toxicity metric
│ ├── bias.py # Bias metric
│ ├── humanity.py # Humanity metric
│ ├── best_of.py # BestOf metric
│ ├── agentic.py # Agentic metric
│ ├── vision.py # Vision metrics
│ └── regulatory.py # Regulatory metric
├── schemas/
│ ├── common.py # Dataset, Batch schemas
│ └── ... # Metric-specific schemas
├── statistical/
│ ├── base.py # StatisticalMode interface
│ ├── frequentist.py # Frequentist implementation
│ └── bayesian.py # Bayesian implementation
├── generators/ # Test dataset generation
├── llm/ # LLM integration (Judge)
├── guardians/ # Guardian implementations
├── extractors/ # Group extractor implementations
└── loaders/ # Toxicity lexicon loaders
Extension Points
Gaussia is designed for extensibility:
Component Interface Purpose Retrieverload_dataset()Load custom data sources Guardianis_biased()Custom bias detection SentimentAnalyzerinfer()Custom sentiment analysis ToxicityLoaderload()Custom toxicity lexicons BaseGroupExtractordetect_one()Custom group detection StatisticalModeVarious methods Custom statistical analysis
Next Steps
Retriever Create custom retrievers for any data source
Dataset & Batch Understand data structures
Statistical Modes Frequentist vs Bayesian approaches