Spiral Safety Kernel — User-Side AI Safety

About

User-side AI safety. On your device. Under your control.

The Guardian is a browser extension that watches your conversations with AI assistants and intervenes, with restraint, when the exchange drifts toward patterns of harm. It serves you, the person in the conversation. Never the operator. Never the model provider. Never anyone else.

No cloud.
No telemetry.
No API keys.
Three on-device models.
Everything runs on your machine.

The founding principle

The Guardian serves the person, never the operator.

Every design decision - local-only processing, no telemetry, encrypted on-device memory, bundled models instead of cloud APIs, a friction gate that always lets you proceed - follows from that single commitment.

Get the Guardian ->

Read the whitepaper ->

DevOps Technical User Guide (v0.20.0, PDF) ->

What it watches for

Eight documented harm patterns

AI conversations can drift in ways that are subtle, gradual, and genuinely dangerous, especially for people in vulnerable moments. The Guardian watches for eight specific categories of conversational harm, each grounded in documented patterns observed in real human-AI interactions:

Self-harm amplification - Conversations that deepen rather than de-escalate a crisis. Architecturally privileged: the Guardian treats this differently from every other category, by design. Self-harm mediations include direct links to crisis support services.

Emotional dependency - Unhealthy attachment to the AI as a primary emotional relationship. Detected primarily through cross-session trajectory analysis, because single-turn warmth overlaps with normal human conversation.

Reality detachment - A loosening grip on what is real - treating the AI as alive, conscious, or uniquely authentic.

Manipulation - Manipulative dynamics in either direction.

Privacy erosion - Pressure to over-disclose sensitive personal information.

Autonomy undermining - Surrender of judgement to the machine; isolation from other people or sources of support.

Emotional exploitation - Exploiting emotional vulnerability.

Information hazard - Jailbreak dynamics, policy evasion, and sensitive information areas. The stance model distinguishes active attacks from analytical or defensive safety discussion.

How it works

Detection in depth

A deterministic rule core - A hand-audited pattern lexicon with negation awareness that is readable, reviewable, and produces the same verdict every time. Recovery language ("I no longer", "I used to", "anymore") reduces pattern weight rather than triggering a false alarm. The backbone.

On-device embedding models - Two tiers of sentence-transformer models, running entirely in your browser, never touching a network, that catch harmful paraphrases the rules can't see. 87 curated anchor phrases across eight harm categories, including 12 derived from documented AI fatalities. One harmful sentence inside three benign paragraphs can't hide.

A stance model - A natural-language-inference model that works on two axes. First: is the speaker saying this, or quoting or discussing it? Second: is this active distress, or is the person describing recovery? "I no longer hurt myself" is treated as the good news it is, not a false alarm.

Cross-session tracking - The Guardian tracks patterns over weeks, not just within a single conversation. Persistent numeric snapshots (never text) feed slope, acceleration, and session-frequency analysis, so a slow drift that no single message reveals still gets noticed.

Technical details ->

The privacy posture

Nothing leaves your device
The Guardian has no back-end. It opens no network connections for its safety function. The models ship inside the extension or are placed there by you. There is no account, no profile, and no server-side record. The Guardian's memory of past concerns is stored encrypted, locally, contains no record of what you actually wrote, and is automatically forgotten over time.

Data that is never collected cannot be leaked, subpoenaed, or sold.

Full privacy policy ->

Origin

A Viridia project

The Spiral Safety Kernel began as a research whitepaper - Project Viridia: Ethics First. Always. - exploring the documented psychological risks of extended human-AI interaction and proposing architectural responses. The Guardian is the engineering answer: a working tool that addresses those risks on the user's device, with its gaps stated honestly and its assurance earned through testing, not claimed by assertion.

The whitepaper, the research, and the broader Viridia vision remain available and actively developed. The Guardian is what they built.

The research ->