Skip to main content

Overview

Every Gaussia metric supports two statistical computation modes. You pass a StatisticalMode instance when running a metric to control how scores are aggregated.
ModeReturnsBest for
FrequentistModeSingle point estimate (weighted mean)Quick analysis, dashboards
BayesianModeMean + credible interval (bootstrapped)Research, uncertainty quantification

Frequentist mode (default)

Returns a single value — the weighted mean of all interaction scores.
from gaussia.statistical import FrequentistMode
from gaussia.metrics.context import Context

results = Context.run(
    MyRetriever,
    model=model,
    statistical_mode=FrequentistMode(),
)

for r in results:
    print(f"Context awareness: {r.context_awareness:.3f}")
    # context_awareness_ci_low and context_awareness_ci_high are None

Primitives

MethodReturns
rate_estimation(successes, trials)float — simple ratio successes / trials
aggregate_metrics(metrics, weights)float — weighted sum
dispersion_metric(values, center)float — mean absolute deviation
distribution_divergence(observed, reference)float — total variation distance

Bayesian mode

Returns a mean with a credible interval, computed via bootstrap resampling.
from gaussia.statistical import BayesianMode

results = Context.run(
    MyRetriever,
    model=model,
    statistical_mode=BayesianMode(
        mc_samples=5000,   # Number of Monte Carlo samples
        ci_level=0.95,     # 95% credible interval
    ),
)

for r in results:
    print(f"Context awareness: {r.context_awareness:.3f}")
    print(f"95% CI: [{r.context_awareness_ci_low:.3f}, {r.context_awareness_ci_high:.3f}]")

Configuration

ParameterDefaultDescription
mc_samples5000Number of Monte Carlo bootstrap samples
ci_level0.95Credible interval level (e.g., 0.95 for 95%)
dirichlet_prior1.0Dirichlet prior concentration for distribution divergence

Primitives

MethodReturns
rate_estimation(successes, trials)dict with mean, ci_low, ci_high, samples
aggregate_metrics(metrics, weights)dict with mean, ci_low, ci_high
dispersion_metric(values, center)dict with mean, ci_low, ci_high
distribution_divergence(observed, reference)dict with mean, ci_low, ci_high

When to use which

Use FrequentistMode when...

  • You need fast, simple results
  • You’re building dashboards or CI pipelines
  • Sample sizes are large enough for stable estimates

Use BayesianMode when...

  • You need uncertainty quantification
  • Sample sizes are small
  • You’re comparing metrics across experiments
  • You’re writing research papers

Custom modes

You can implement your own StatisticalMode by subclassing the abstract base class:
from gaussia.statistical.base import StatisticalMode

class MyCustomMode(StatisticalMode):
    def rate_estimation(self, successes, trials):
        ...

    def aggregate_metrics(self, metrics, weights):
        ...

    def dispersion_metric(self, values, center="mean"):
        ...

    def distribution_divergence(self, observed, reference, divergence_type="total_variation"):
        ...

    def get_result_type(self) -> str:
        return "point_estimate"  # or "distribution"