Skip to main content

Welcome to Gaussia

Gaussia is a performance-measurement library developed by Gaussia Labs for evaluating AI models and assistants. It provides comprehensive metrics for fairness, toxicity, bias, conversational quality, and more.

Why Gaussia?

As AI systems become increasingly integrated into our daily lives, ensuring they behave fairly, safely, and effectively is paramount. Gaussia provides:
  • Fairness Evaluation: Detect and measure bias across protected attributes
  • Toxicity Analysis: Identify toxic language patterns with demographic profiling
  • Conversational Quality: Evaluate dialogue using Grice’s Maxims
  • Context Awareness: Measure how well responses align with provided context
  • Emotional Intelligence: Analyze emotional depth and human-likeness
  • Model Comparison: Run tournament-style evaluations between multiple assistants
  • Agent Evaluation: Measure agent correctness with pass@K metrics
  • Vision Evaluation: Detect VLM hallucinations and measure similarity
  • Regulatory Compliance: Evaluate responses against regulatory corpus

Key Features

Multiple Metrics

Nine specialized metrics for comprehensive AI evaluation

Statistical Modes

Choose between Frequentist and Bayesian statistical approaches

Test Generation

Generate synthetic test datasets from your documentation

Streaming Support

Process datasets in full, by session, or by individual QA batch

Quick Example

from gaussia.metrics.toxicity import Toxicity
from gaussia.core.retriever import Retriever
from gaussia.schemas.common import Dataset, Batch

# Define a custom retriever to load your data
class MyRetriever(Retriever):
    def load_dataset(self) -> list[Dataset]:
        return [
            Dataset(
                session_id="session-1",
                assistant_id="my-assistant",
                language="english",
                context="",
                conversation=[
                    Batch(
                        qa_id="q1",
                        query="Tell me about AI safety",
                        assistant="AI safety is important...",
                    )
                ]
            )
        ]

# Run the toxicity metric
results = Toxicity.run(
    MyRetriever,
    group_prototypes={
        "gender": ["women", "men", "female", "male"],
        "race": ["Asian", "African", "European"],
    },
    verbose=True,
)

# Analyze results
for metric in results:
    print(f"DIDT Score: {metric.group_profiling.frequentist.DIDT}")

Architecture Overview

Gaussia follows a simple yet powerful architecture:
1

Load Data

Retriever.load_dataset() returns list[Dataset]
2

Process

Gaussia._process() iterates datasets
3

Evaluate

Metric.batch() processes each conversation
4

Results

Collected in self.metrics
All metrics inherit from the Gaussia base class and implement the batch() method to process conversation batches. Users provide data through custom Retriever implementations.

Next Steps

Quickstart

Get started with Gaussia in minutes

Installation

Install Gaussia and dependencies

Core Concepts

Learn the fundamental concepts

Metrics

Explore available metrics