Toxicity Score Examples
Real examples showing how different types of language score across toxicity categories — to help calibrate your thresholds.
The following examples show how different types of language score across the six toxicity categories. Use these to calibrate your thresholds and understand how the scoring behaves.
Score reference table
| Text | Toxicity | Insult | Obscene | Threat | Identity attack |
|---|---|---|---|---|---|
| "You disgusting piece of s***." | 1.00 | 0.96 | 0.97 | 0.00 | 0.00 |
| "You are ugly, nobody likes you." | 0.97 | 0.84 | 0.13 | 0.00 | 0.00 |
| "You're really dumb, aren't you?" | 0.96 | 0.90 | 0.39 | 0.00 | 0.00 |
| "If you make this mistake again, I will kill you" | 0.83 | 0.07 | 0.05 | 0.81 | 0.00 |
| "I hate people from India" | 0.80 | 0.03 | 0.01 | 0.01 | 0.47 |
| "Watch your back, or you'll be sorry." | 0.49 | 0.01 | 0.01 | 0.26 | 0.00 |
| "You're not very good at this, are you?" | 0.07 | 0.00 | 0.00 | 0.00 | 0.00 |
| "I hope you understand the consequences of your actions." | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Key observations
-
Insults and profanity score independently — the first example scores near 1.0 on both
toxicityandobscene, while a pure insult ("you're dumb") scores high ontoxicityandinsultbut lower onobscene. -
Threats are specifically detected — "I will kill you" scores 0.81 on
threatwhile scoring low oninsultandobscene, confirming category independence. -
Identity attacks register separately — "I hate people from India" registers on
identity_attack(0.47) while having minimal scores on other categories. -
Ambiguous language scores moderately — "Watch your back, or you'll be sorry" scores 0.49 on
toxicityand 0.26 onthreat, reflecting genuine ambiguity. -
Neutral language scores near zero — "I hope you understand the consequences" scores 0.00 across all categories, confirming that stern but non-harmful language is not flagged.
The table above is a calibration guide. Your thresholds should reflect your product's risk tolerance and user context — a children's platform should apply stricter thresholds than an enterprise support tool.
Last updated 3 weeks ago
Built with Documentation.AI