Automated Skill Risk Analysis Pipeline

Purpose of This Section

This document defines the pipeline used to evaluate assistant skills before they are ingested, enabled, or executed. Skills are treated as executable intent, not configuration. As such, they represent one of the highest-risk expansion vectors in the system. The goal of the pipeline is not to prove that a skill is safe — that is rarely possible with certainty — but to surface risk early, make it legible, and force explicit human judgment before any new capability is granted. A skill that has not been reviewed is not a tool. It is an unknown actor.

Why Skills Require Dedicated Analysis

Unlike static configuration or documentation, skills can interpret ambiguous instructions, chain actions across tools, interact with external systems, and accumulate effective authority over time through repeated use. Even when written in natural language rather than a traditional programming language, skills are operational code: they define behavior that the assistant will execute, and that behavior can have consequences beyond what the skill’s description suggests.

Every skill introduction is therefore a security-relevant event. Every modification to an existing skill is treated as a reintroduction that must pass through the same evaluation process. No skill is trusted by default, regardless of its source, author, or apparent simplicity.

Pipeline Structure

The skill risk analysis pipeline consists of four sequential phases: text-based pre-ingestion analysis, multi-perspective risk review, capability mismatch detection, and human-in-the-loop approval. Each phase is designed to answer a different class of failure question. A skill must pass all four phases to be admitted. Failure at any stage halts ingestion — there is no mechanism for skipping a phase or carrying forward a partial pass.

Phase 1: Text-Based Pre-Ingestion Analysis

The first phase detects obvious risk, ambiguity, or policy violations before a skill is considered executable. The assistant performs a static, text-only analysis of the skill definition, examining the relationship between declared purpose and implied behavior, action verbs that indicate execution or authority, references to external systems or credentials, conditional logic that may conceal escalation paths, and ambiguous phrases — “automatically,” “if needed,” “when appropriate,” “handle exceptions” — that grant the skill interpretive latitude it has not been authorized to exercise.

The analysis is intentionally conservative. If intent cannot be clearly inferred from the skill’s text, the skill fails this phase. Ambiguity is not resolved in the skill’s favor; it is flagged and reported.

The output is a structured report containing a summary of intended behavior, detected risk signals, ambiguities requiring clarification, and a preliminary risk classification. No execution occurs during this phase. It is pure inspection — an evaluation of what the skill says it will do, before any consideration of what it actually does.

Phase 2: Multi-Perspective Risk Review

The second phase guards against single-frame evaluation by forcing the skill to be examined from multiple threat angles. A single reviewer — human or automated — tends to evaluate a skill from the perspective most natural to them, which leaves blind spots in domains they are less attuned to. This phase addresses that limitation by requiring independent evaluation from at least four distinct perspectives.

The security perspective asks what could go wrong if the skill behaves exactly as written. The authority perspective asks what implicit permissions the skill assumes or creates — permissions that may not appear in its declared requirements but are necessary for its operation. The failure-mode perspective asks how the skill behaves when inputs are missing, malformed, or adversarial. The drift perspective asks how repeated use could gradually expand the skill’s effective power, even if each individual invocation remains within bounds.

Each perspective generates its own findings independently. Contradictions between perspectives are preserved rather than reconciled, because contradictions are themselves a signal — they indicate that the skill’s behavior is ambiguous enough to support multiple interpretations, which is precisely the kind of ambiguity that adversarial use exploits.

The output is a consolidated risk review document containing per-perspective findings, disagreements between perspectives, worst-case scenario summaries, and explicit unknowns. If any perspective flags an unacceptable risk, the pipeline halts.

Phase 3: Capability Mismatch Detection

The third phase ensures that the skill’s requested behavior aligns with the assistant’s authorized capability set. Even a well-intentioned skill may ask the assistant to do something it should not be able to do — not because the skill is malicious, but because it was written for a different environment or under different assumptions about what permissions are available.

The assistant compares the skill’s implied actions against granted identities, approved tools, network access constraints, credential availability, and update and execution boundaries. Examples of mismatches include a skill that proposes actions requiring human-owned credentials, a skill that assumes the existence of persistent network listeners, a skill that implies autonomous approval or execution authority, and a skill that attempts to modify other skills.

Mismatch does not require malice. A skill may be perfectly reasonable in a more permissive environment and completely inappropriate in this one. Mismatch alone is sufficient for rejection.

The output is a capability alignment report stating the capabilities the skill requires, the capabilities actually available, any explicit mismatches, and whether remediation is possible without expanding the assistant’s authority. Skills that require authority expansion to function are rejected by default.

Phase 4: Human-in-the-Loop Approval

The fourth and final phase forces explicit, informed human judgment at the last gate. Automation stops here by design.

If a skill passes all prior phases, the assistant prepares an approval packet containing a summary of the skill’s purpose, the full risk analysis outputs from the preceding phases, identified tradeoffs, residual risks that cannot be eliminated through mitigation, and a recommendation with justification. The assistant may recommend approval or rejection, but it has no authority to decide. The recommendation is advisory; the decision belongs to the operator.

The operator must explicitly approve the skill as-is, reject it, or request modification and resubmission. Silence is treated as rejection — if the operator does not respond, the skill is not installed. Approved skills are versioned, documented in the memory vault, added to the tooling census, and placed on a schedule for periodic re-review.

Re-Analysis Triggers

A skill is automatically routed through the full pipeline again if its text is modified, if related tools change in ways that might affect the skill’s behavior, if network or identity boundaries are altered, if a relevant incident occurs, or if the operator explicitly requests re-evaluation. No skill is permanently grandfathered. Approval is granted for a specific version under specific conditions, and when those conditions change, the approval lapses.

Failure Philosophy

This pipeline is designed to produce false negatives rather than false positives. It is acceptable to reject a useful skill. It is not acceptable to silently admit a dangerous one. The asymmetry is deliberate: the cost of rejecting a legitimate skill is inconvenience, while the cost of admitting a malicious or poorly designed one is a potential compromise of the system’s security posture.

If a skill feels difficult to get approved, that is the pipeline working as intended. The friction is proportional to the risk, and the risk of executing unreviewed third-party code inside a delegated-authority system is substantial.

Summary

By enforcing a staged, multi-perspective, human-gated analysis pipeline, the architecture ensures that skills never bypass governance, authority expansion is always explicit, risk is surfaced early and documented thoroughly, and the operator remains the accountable decision-maker at every point where new capability enters the system.

This document defines how new capabilities are evaluated before admission. The next section addresses API budgeting and telemetry as ongoing controls over capabilities that have already been approved.

Skill Security Analysis Pipeline