Backup and Recovery
Backups preserve understanding, not execution state. Manual recovery; no auto-restore.
Backup and Recovery
Purpose of This Section
This document describes how the assistant’s state, knowledge, and operational continuity are preserved over time, and how recovery is performed in the event of failure, compromise, or intentional teardown. The design goal is resilience without privilege resurrection: backups must preserve knowledge and intent without reintroducing risk.
Backup Philosophy
Backups are treated as archives of understanding, not snapshots of execution. The distinction is critical. A traditional system backup captures everything needed to restore a running system to its previous state — including active sessions, cached credentials, and runtime processes. In a security-conscious architecture where authority is deliberately scoped and revocable, restoring all of that state would silently re-establish permissions that may have been compromised or that should have expired.
This architecture avoids that outcome by explicitly excluding runtime processes, active sessions, credentials, secrets, and live authentication state from all backups. What remains — and what is actively preserved — is documentation, sanitized configuration, work artifacts, and decision history. A backup captures what the assistant knew and how it was configured, not what it was authorized to do. Authority must be re-established through the same explicit mechanisms used to grant it in the first place.
What Is Backed Up
The most critical asset is the documentation vault: all Markdown files in the Obsidian vault, including decision logs, architecture notes, and operational summaries. This vault represents the assistant’s durable memory and is the primary artifact that enables continuity across rebuilds.
Configuration files are also preserved, including the assistant’s operational configuration, scheduled tasks, tool definitions, and access rules. All configuration is stored in text form and explicitly excludes secrets. If a configuration file references an API key, the reference is preserved but the key itself is not.
Work artifacts — draft documents, generated reports, and non-sensitive output files — are included to maintain continuity of ongoing work. These are less critical than documentation and configuration but reduce the friction of resuming after a recovery.
What Is Excluded
Certain categories of data are never included in backups: API keys, OAuth tokens, passkeys, SSH keys, vault credentials, and browser session data. These exclusions are enforced through .gitignore rules and manual review. A backup that contains secrets is considered invalid regardless of what else it preserves.
The rationale is the same one that governs secret handling throughout this architecture. Secrets in a Git repository persist in its history even after deletion. Excluding them from backups entirely is simpler, more reliable, and more auditable than attempting to include and then scrub them.
Backup Mechanism
Backups are implemented as Git-based snapshots. A scheduled job collects eligible files, verifies that exclusion rules have been applied, commits the changes to a dedicated repository, and pushes to the shared organization. This produces incremental backups with immutable history and clear authorship on every commit. Git serves simultaneously as the storage medium and the audit trail — every backup is a commit, every commit records who made it and when, and the full history of all backups is available for inspection.
Backups are performed on a regular schedule and after significant events such as configuration changes or major architectural decisions. The frequency balances freshness against noise: frequent enough that recent work is never lost, infrequent enough that the backup history remains meaningful rather than cluttered with trivial changes.
Recovery Model
Recovery is deliberate and manual. There is no automatic restore of execution state. The process consists of provisioning a new, clean environment; reinstalling the assistant software; restoring documentation and configuration from backup; reissuing credentials manually through the standard provisioning process; and validating behavior before resuming normal operation.
Each of these steps is intentional. Provisioning a clean environment ensures that any compromise present in the previous VM does not carry over. Reinstalling rather than restoring the software allows the operator to upgrade or change versions during recovery. Reissuing credentials forces a fresh authentication cycle, which means that any compromised tokens from the previous instance are irrelevant. And validating behavior before resumption gives the operator a checkpoint — a moment to confirm that the restored assistant is functioning as expected before it resumes acting on delegated authority.
Recovery, in this architecture, is an act of intent rather than an act of inertia. The system does not drift back into operation; it is consciously restarted.
Incident Response Integration
When recovery is triggered by an incident rather than a routine rebuild, additional steps apply. All credentials associated with the previous instance are revoked before any restoration begins. Network access is reassessed to determine whether the previous configuration remains appropriate. Backup integrity is verified to ensure that the backup itself was not tampered with during the incident. Only after these steps are complete is restoration attempted. Recovery never bypasses incident response — it follows from it.
Replaceability
Because backups preserve understanding rather than execution state, the assistant is not bound to any particular version of itself. A backup from one version can be restored into a different version. The architecture can evolve between teardown and rebuild. Mistakes in the previous configuration can be corrected during recovery rather than fossilized by it. This supports long-term improvement without lock-in and reinforces the principle that the assistant is a replaceable component, not an irreplaceable institution.
Tolerated Failures
The system is designed to tolerate total VM loss, disk corruption, software defects, and intentional teardown. Any of these events triggers the same recovery process, and none of them results in permanent loss of the assistant’s knowledge or operational history. What the system does not tolerate is silent resurrection of compromised state — a restore that brings back not just what the assistant knew, but what it was authorized to do, without the operator’s conscious reauthorization.
Summary
By designing backups around documentation and configuration rather than live state, the architecture achieves safe recoverability, strong auditability, clean reboots, and controlled continuity. Backups preserve meaning — the decisions, rationale, and context that make the assistant useful — without preserving the authority that makes it potentially dangerous.
This document defines how the system survives failure. The next section addresses update control, change management, and lifecycle governance built on top of this recovery model.