Gaussia is an open-source library for evaluating AI-generated content with scientific rigor. Every metric is backed by peer-reviewed research, so your evaluations become part of the scientific record — not just a number in a dashboard. Born at Alquimia AI Labs.Documentation Index
Fetch the complete documentation index at: https://docs.gaussia.ai/llms.txt
Use this file to discover all available pages before exploring further.
Why Gaussia
There’s a thriving ecosystem of tools for evaluating language models. Most of them work. And yet, if you ask an engineering team why a faithfulness score of 0.83 is trustworthy, the most common answer is: “Because that’s what the dashboard says.” That’s not a technical problem. It’s an epistemological one. Evaluating AI systems with metrics that no one can trace, cite, or reproduce is exactly the kind of magical thinking those systems taught us to avoid. Existing tools treat quality, security, and ethics in silos. No single tool covers all three with scientific rigor. When a tool gives you a score of 0.83, you can’t cite the paper that defines what 0.83 means. And every tool assumes you’re evaluating an AI model — but intelligent behavior can come from humans too. Gaussia starts from a different premise: the unit of analysis is the behavior, not the architecture. A behavior can come from an LLM, a voice agent in a call center, a human operator, or a hybrid system.The methodology
Every metric in Gaussia comes with a contract: explicit scientific backing. When you use a metric, you know exactly what paper defined it, how it was validated, and how to cite it in your own work.Paper first
No metric exists without its paper. Title, authors, year, venue, arXiv/DOI, implementation notes, validation datasets, and BibTeX entry — all included.
Reproducible
Every implementation follows the exact methodology described in the paper. Run the same validation the authors did.
Citeable
When you use Gaussia in production or research, you can cite the underlying papers. Your evaluations become part of the scientific record.
How metrics get added
Gaussia doesn’t implement metrics because they sound good. Every metric goes through a public review process before a single line of code is written.Paper proposal
Anyone can open an issue with a reference to a peer-reviewed paper. The discussion starts with the science, not the code.
Methodology discussion
Open debate about the paper’s assumptions, limitations, and how the implementation should map to the methodology.
What makes it different
Multi-environment native
Native implementations for every major environment — server, edge, browser, and embedded. Not wrappers around a single runtime.
Multimodal by design
Text, audio, image, video. Intelligence doesn’t communicate only with text. Neither should evaluation.
Infrastructure, not platform
MIT license. No telemetry. No lock-in. Build your own dashboards, auditing services, or compliance tools on top.
SDKs
Every metric starts as a paper. Once approved, the community builds official SDKs that implement the science in your language of choice.| SDK | Status |
|---|---|
| Python | Stable |
| TypeScript | Beta |
| Rust | Planned |
| C++ | Planned |
| Swift | Planned |
| Go | Planned |
Explore
Paper index
All referenced papers with BibTeX entries and summaries.
GitHub
Source code, issues, RFCs, and contribution guidelines.