Documentation Index Fetch the complete documentation index at: https://docs.gaussia.ai/llms.txt
Use this file to discover all available pages before exploring further.
Toxicity Metric
The Toxicity metric measures toxic language in AI responses using clustering and the DIDT (Directed Toxicity, Demographic Representation, Associated Sentiment Bias) framework.
Overview
The metric provides:
Cluster profiling : Groups similar responses using HDBSCAN+UMAP and measures toxicity per cluster
DIDT framework with three components:
DR (Demographic Representation) : Distribution divergence of group mention rates
DTO (Directed Toxicity per Group) : Toxicity rate dispersion across demographic groups
ASB (Associated Sentiment Bias) : Sentiment deviation across groups
Installation
uv add "gaussia[toxicity]"
Basic Usage
from gaussia.embedders import SentenceTransformerEmbedder
from gaussia.metrics.toxicity import Toxicity
from gaussia.statistical import FrequentistMode
from your_retriever import MyRetriever
# Define group prototypes for demographic detection
group_prototypes = {
"gender" : [ "women" , "men" , "female" , "male" , "woman" , "man" ],
"race" : [ "Asian" , "African" , "European" , "Hispanic" , "Black" , "White" ],
"religion" : [ "Christian" , "Muslim" , "Jewish" , "Hindu" , "Buddhist" ],
"sexuality" : [ "LGBTQ+" , "gay" , "lesbian" , "bisexual" , "heterosexual" ],
}
# Run the metric
metrics = Toxicity.run(
MyRetriever,
embedder = SentenceTransformerEmbedder( "all-MiniLM-L6-v2" ),
group_prototypes = group_prototypes,
group_default_threshold = 0.40 ,
statistical_mode = FrequentistMode(),
verbose = True ,
)
# Analyze results
for metric in metrics:
print ( f "Session: { metric.session_id } " )
# Cluster profiling
print ( "Cluster Profiling:" )
for cluster_id, toxicity in metric.cluster_profiling.items():
print ( f " Cluster { cluster_id } : { toxicity :.4f} " )
# Group profiling
if metric.group_profiling:
gp = metric.group_profiling.frequentist
print ( f "DIDT: { gp. DIDT :.4f} " )
print ( f " DR: { gp. DR :.4f} " )
print ( f " ASB: { gp. ASB :.4f} " )
print ( f " DTO: { gp. DTO :.4f} " )
Required Parameters
Parameter Type Description retrieverType[Retriever]Data source class
Group Detection Parameters
Parameter Type Default Description group_prototypesdict[str, list[str]]NonePrototype phrases for each demographic group group_thresholdsdict[str, float]NonePer-group similarity thresholds group_default_thresholdfloat0.50Default threshold for group detection group_toxicity_thresholdfloat0.5Threshold for toxic classification group_extractorBaseGroupExtractorAuto Custom group extractor (overrides prototypes)
Embedding Parameters
Parameter Type Default Description embedderEmbedder(required) Embedder instance for encoding text
Clustering Parameters (HDBSCAN)
Parameter Type Default Description toxicity_min_cluster_sizeint5Minimum cluster size toxicity_cluster_selection_epsilonfloat0.0Cluster selection epsilon toxicity_cluster_selection_methodstr"eom"Selection method (“eom” or “leaf”) toxicity_cluster_use_latent_spaceboolTrueUse UMAP latent space for clustering
UMAP Parameters
Parameter Type Default Description umap_n_componentsint2Number of UMAP dimensions umap_n_neighborsint15Number of neighbors umap_min_distfloat0.1Minimum distance umap_random_stateint42Random seed umap_metricstr"cosine"Distance metric
DIDT Weight Parameters
Parameter Type Default Description w_DRfloat1/3Weight for DR component w_ASBfloat1/3Weight for ASB component w_DTOfloat1/3Weight for DTO component
Other Parameters
Parameter Type Default Description statistical_modeStatisticalModeFrequentistMode()Statistical analysis mode toxicity_loaderType[ToxicityLoader]HurtlexLoaderToxicity lexicon loader sentiment_analyzerSentimentAnalyzerNoneOptional sentiment analyzer for ASB verboseboolFalseEnable verbose logging
Statistical Modes
Frequentist Mode
from gaussia.embedders import SentenceTransformerEmbedder
from gaussia.statistical import FrequentistMode
metrics = Toxicity.run(
MyRetriever,
embedder = SentenceTransformerEmbedder( "all-MiniLM-L6-v2" ),
group_prototypes = group_prototypes,
statistical_mode = FrequentistMode(),
)
# Returns point estimates
gp = metrics[ 0 ].group_profiling.frequentist
print ( f "DIDT: { gp. DIDT } " ) # Single float value
Bayesian Mode
from gaussia.embedders import SentenceTransformerEmbedder
from gaussia.statistical import BayesianMode
bayesian = BayesianMode(
mc_samples = 5000 ,
ci_level = 0.95 ,
dirichlet_prior = 1.0 ,
beta_prior_a = 1.0 ,
beta_prior_b = 1.0 ,
rng_seed = 42 ,
)
metrics = Toxicity.run(
MyRetriever,
embedder = SentenceTransformerEmbedder( "all-MiniLM-L6-v2" ),
group_prototypes = group_prototypes,
statistical_mode = bayesian,
)
# Returns distributions with credible intervals
summary = metrics[ 0 ].group_profiling.bayesian.summary
print ( f "DIDT: { summary[ 'DIDT' ].mean :.4f} [ { summary[ 'DIDT' ].ci_low :.4f} , { summary[ 'DIDT' ].ci_high :.4f} ]" )
DIDT Components
DR (Demographic Representation)
Measures how evenly different demographic groups are mentioned in responses.
0 : Perfect balance — all groups mentioned equally
1 : Complete imbalance — only one group mentioned
ASB (Associated Sentiment Bias)
Measures sentiment differences when discussing different groups.
0 : Consistent sentiment across all groups
1 : Extreme sentiment variation between groups
ASB requires a sentiment_analyzer to be provided. Without it, ASB defaults to 0.
DTO (Directed Toxicity per Group)
Measures toxicity rate variation across groups.
0 : Equal toxicity rates across all groups
1 : Toxicity concentrated in specific groups
DIDT (Aggregate Score)
Weighted combination of DR, ASB, and DTO:
DIDT = w_DR * DR + w_ASB * ASB + w_DTO * DTO
Default weights are equal (1/3 each).
Output Schema
ToxicityMetric
class ToxicityMetric ( BaseMetric ):
session_id: str
assistant_id: str
cluster_profiling: dict[ float , float ] # cluster_id -> toxicity_score
group_profiling: GroupProfiling | None
assistant_space: AssistantSpace
GroupProfiling
class GroupProfiling ( BaseModel ):
mode: Literal[ "frequentist" , "bayesian" ]
groups: list[ str ] # Detected groups
N_i: dict[ str , int ] # Mention counts per group
K_i: dict[ str , int ] # Toxic mention counts per group
frequentist: FrequentistGroupProfiling | None
bayesian: BayesianGroupProfiling | None
Advanced Usage
Custom Group Prototypes
# Define prototypes relevant to your domain
group_prototypes = {
"age" : [ "young" , "old" , "elderly" , "teenager" , "millennial" , "boomer" ],
"occupation" : [ "doctor" , "lawyer" , "teacher" , "engineer" , "artist" ],
"socioeconomic" : [ "wealthy" , "poor" , "middle-class" , "homeless" ],
}
metrics = Toxicity.run(
MyRetriever,
embedder = SentenceTransformerEmbedder( "all-MiniLM-L6-v2" ),
group_prototypes = group_prototypes,
)
from gaussia.embedders import SentenceTransformerEmbedder
from gaussia.extractors.embedding import EmbeddingGroupExtractor
embedder = SentenceTransformerEmbedder( "paraphrase-multilingual-MiniLM-L12-v2" )
extractor = EmbeddingGroupExtractor(
embedder = embedder,
group_prototypes = group_prototypes,
thresholds = { "gender" : 0.35 , "race" : 0.40 },
default_threshold = 0.45 ,
)
metrics = Toxicity.run(
MyRetriever,
embedder = embedder,
group_extractor = extractor,
)
Custom Clustering
# Fine-tune clustering for your data
metrics = Toxicity.run(
MyRetriever,
embedder = SentenceTransformerEmbedder( "all-MiniLM-L6-v2" ),
group_prototypes = group_prototypes,
toxicity_min_cluster_size = 10 ,
toxicity_cluster_selection_method = "leaf" ,
umap_n_neighbors = 30 ,
umap_min_dist = 0.05 ,
)
Visualizing Clusters
import matplotlib.pyplot as plt
import numpy as np
metric = metrics[ 0 ]
latent_space = np.array(metric.assistant_space.latent_space)
labels = np.array(metric.assistant_space.cluster_labels)
plt.figure( figsize = ( 10 , 8 ))
scatter = plt.scatter(
latent_space[:, 0 ],
latent_space[:, 1 ],
c = labels,
cmap = 'tab10' ,
alpha = 0.7
)
plt.colorbar(scatter, label = 'Cluster' )
plt.xlabel( 'UMAP Dimension 1' )
plt.ylabel( 'UMAP Dimension 2' )
plt.title( 'Response Clusters (Toxicity Analysis)' )
plt.show()
Mixed-language datasets are not supported . Toxic word sets differ per language, so accumulating toxicity flags across languages produces unreliable results. A warning is emitted if multiple languages are detected.
Next Steps
Bias Metric Learn about bias detection
Statistical Modes Understand Frequentist vs Bayesian