Introduction

Gaussia is an open-source library for evaluating AI-generated content with scientific rigor. Every metric is backed by peer-reviewed research, so your evaluations become part of the scientific record — not just a number in a dashboard. Born at Alquimia AI Labs.

Why Gaussia

There’s a thriving ecosystem of tools for evaluating language models. Most of them work. And yet, if you ask an engineering team why a faithfulness score of 0.83 is trustworthy, the most common answer is: “Because that’s what the dashboard says.” That’s not a technical problem. It’s an epistemological one. Evaluating AI systems with metrics that no one can trace, cite, or reproduce is exactly the kind of magical thinking those systems taught us to avoid. Existing tools treat quality, security, and ethics in silos. No single tool covers all three with scientific rigor. When a tool gives you a score of 0.83, you can’t cite the paper that defines what 0.83 means. And every tool assumes you’re evaluating an AI model — but intelligent behavior can come from humans too. Gaussia starts from a different premise: the unit of analysis is the behavior, not the architecture. A behavior can come from an LLM, a voice agent in a call center, a human operator, or a hybrid system.

The methodology

Every metric in Gaussia comes with a contract: explicit scientific backing. When you use a metric, you know exactly what paper defined it, how it was validated, and how to cite it in your own work.

Paper first

No metric exists without its paper. Title, authors, year, venue, arXiv/DOI, implementation notes, validation datasets, and BibTeX entry — all included.

Reproducible

Every implementation follows the exact methodology described in the paper. Run the same validation the authors did.

Citeable

When you use Gaussia in production or research, you can cite the underlying papers. Your evaluations become part of the scientific record.

How metrics get added

Gaussia doesn’t implement metrics because they sound good. Every metric goes through a public review process before a single line of code is written.

Paper proposal

Anyone can open an issue with a reference to a peer-reviewed paper. The discussion starts with the science, not the code.

Methodology discussion

Open debate about the paper’s assumptions, limitations, and how the implementation should map to the methodology.

RFC & implementation

Only after consensus on interpretation does the RFC open. Every design decision is documented and traceable.

This means Gaussia’s metrics are publicly audited before they exist. The debate is visible. Disagreements are recorded. Implementation is traceable to documented decisions.

What makes it different

Multi-environment native

Native implementations for every major environment — server, edge, browser, and embedded. Not wrappers around a single runtime.

Multimodal by design

Text, audio, image, video. Intelligence doesn’t communicate only with text. Neither should evaluation.

Infrastructure, not platform

MIT license. No telemetry. No lock-in. Build your own dashboards, auditing services, or compliance tools on top.

SDKs

Every metric starts as a paper. Once approved, the community builds official SDKs that implement the science in your language of choice.

SDK	Status
Python	Stable
TypeScript	Beta
Rust	Planned
C++	Planned
Swift	Planned
Go	Planned

Why Gaussia

The methodology

Paper first

Reproducible

Citeable

How metrics get added

What makes it different

Multi-environment native

Multimodal by design

Infrastructure, not platform

SDKs

Explore

Paper index

GitHub

​Why Gaussia

​The methodology

Paper first

Reproducible

Citeable

​How metrics get added

​What makes it different

Multi-environment native

Multimodal by design

Infrastructure, not platform

​SDKs

​Explore

Paper index

GitHub

Why Gaussia

The methodology

How metrics get added

What makes it different

SDKs

Explore