How Encryption Works
TL;DR: Your journal entries are encrypted with AES-256-GCM using a key derived from your Master Password. The key never leaves your device. AI runs locally via Llama 3.2. If you opt into the collective, three layers of mathematical anonymization protect your identity before anything is transmitted.
The Data Flow
Here's exactly what happens from the moment you type to the moment data (optionally) reaches our servers:
Layer 1: Local Encryption
Every journal entry is encrypted before it's saved — not after, not during sync, but immediately upon creation.
The Algorithm: AES-256-GCM
- AES-256 — Advanced Encryption Standard with a 256-bit key. Used by governments and military worldwide. Brute-forcing a 256-bit key would take longer than the age of the universe.
- GCM (Galois/Counter Mode) — provides both encryption and authentication. If even a single bit of ciphertext is tampered with, decryption fails. This prevents data corruption attacks.
Key Derivation
Your encryption key is derived from your Master Password using a Key Derivation Function (KDF) on your device:
- The Master Password is never stored in plaintext
- The derived key exists only in volatile memory during your session
- The key is never transmitted to any server
- Even if our servers were fully compromised, your entries remain encrypted gibberish
What this means: If you forget your Master Password, your data is gone forever. We cannot recover it. This is a feature, not a bug — it proves we don't have a backdoor.
Layer 2: Local AI Processing
The Higher Self AI doesn't phone home. Here's how it works:
Engine: Ollama running Llama 3.2 (3B parameters), optimized for edge devices. The model runs entirely in local memory.
Process: Your decrypted entry is passed to the local model in RAM. It never touches disk in plaintext, is never logged, and is never transmitted.
Output: The AI generates insights, identifies patterns, and provides CBT-style reflections. All output stays local.
No data is sent to OpenAI, Google, Anthropic, or any third-party AI provider. The model runs on your CPU/GPU. Full stop.
Layer 3: The Anonymization Pipeline
If you opt in to the collective feature, your data passes through three mathematical privacy guarantees before anything leaves your device:
Step 1: Generalization
Specific identifying data is transformed into broad categories:
Age: 31→Age Group: 30-35City: Berlin→Region: EuropeMood: devastated about breakup→State: Struggling
Raw journal text is never included. Only generalized trait labels are produced.
Step 2: Differential Privacy (The "Coin Flip")
A Randomized Response algorithm introduces mathematical noise:
- The algorithm flips a virtual coin
- Heads: Your true generalized trait is used
- Tails: A random decoy trait is substituted
This means any individual data point has plausible deniability — even we can't know if a specific entry is real or noise. But across thousands of users, the noise cancels out and aggregate statistics remain accurate.
Step 3: K-Anonymity (The Rule of 5)
Before any archetype record is committed to the database, the system checks: are there at least 5 other people who share this exact combination of traits?
- If yes → the record is stored
- If no → the record is further "blurred" locally until it fits a larger group
This prevents anyone from being identified by a unique combination of attributes (e.g., the only 80-year-old male Scorpio in Iceland).
Infrastructure: Why Cloudflare
Our entire backend runs on Cloudflare's edge network. This matters because:
- Workers don't see client IPs — by default, Cloudflare Workers don't expose the connecting client's IP address to worker code. No custom middleware needed.
- Edge compute — data is processed at the nearest Cloudflare data center, not routed to a central server
- No traditional servers — there is no VM, no EC2 instance, no container to compromise
- TLS 1.3 — all data in transit is encrypted with modern TLS
What We Store on Our Servers
The Cloudflare D1 database contains exactly two tables:
- archetype_cohorts — generalized, noise-injected archetype records with no user identifiers
- cohort_aggregations — aggregated trend data computed from cohorts that pass the Rule of 5
Zero raw text. Zero user IDs. Zero IP addresses. Zero encryption keys.
Verify It Yourself
Every claim on this page can be verified in our source code:
- Full source code on GitHub
- Encryption implementation:
/src/storage/aes-encryption.ts - Anonymization pipeline:
/src/anonymizer/ - AI system prompt:
/src/ai-engine/prompts.ts - Backend Worker:
/collective-vault-worker/src/
We believe privacy claims without transparency are worthless. Audit us.