AZ-035 — Key Rotation, Key Compromise and Recovery Protocol v1
Status
Acest document definește protocolul pentru:
- clasificarea cheilor;
- rotația normală a cheilor;
- rotația de urgență;
- compromiterea de chei;
- revocarea, înlocuirea și re-autorizarea rolurilor;
- recovery după incidente legate de chei.
După AZ-001 până la AZ-034, există deja:
- specificația protocolului și a subsistemelor lui;
- modelul de securitate;
- guvernanța, incident response și recovery;
- launch discipline, monitoring, review și archive;
- protocolul de upgrade și hard fork.
AZ-035 răspunde la întrebarea: cum administrăm ciclul de viață al cheilor și al autorizărilor asociate lor astfel încât compromiterea, expirarea, rotația sau pierderea cheilor să nu producă haos operațional, confuzie de rol sau continuarea sub identități nesigure?
Scopul documentului este să fixeze:
- clasele de chei și scopurile lor;
- regulile de rotație planificată;
- rotația de urgență după suspiciune sau confirmare de compromis;
- revocarea și re-autorizarea rolurilor;
- efectele asupra consensului, witness, guvernanță, launch și recovery;
- obiectele canonice și evidența auditabilă pentru toate aceste tranziții.
Acest document se bazează pe:
- AZ-002 până la AZ-034, cu accent direct pe AZ-006, AZ-009, AZ-010, AZ-015, AZ-020, AZ-025, AZ-030 și AZ-034.
Termeni:
- MUST = obligatoriu
- MUST NOT = interzis
- SHOULD = recomandat puternic
- MAY = opțional
1. Obiectiv
AZ-035 răspunde la 10 întrebări critice:
- Ce tipuri de chei există în ecosistemul ATLAS ZERO?
- Ce înseamnă rotație normală versus rotație de urgență?
- Ce înseamnă compromitere suspectată versus compromitere confirmată?
- Cum se revocă o cheie sau un rol?
- Cum se introduce o cheie nouă fără ambiguitate?
- Cum sunt afectate rolurile de proposer, verifier, notary, issuer și governance signer?
- Ce se întâmplă dacă o cheie este pierdută, coruptă sau folosită abuziv?
- Cum se face recovery operațional și protocolar după compromitere?
- Ce obiecte, manifesturi și atestări păstrează audit trail-ul acestor schimbări?
- Cum evităm atât downtime inutil, cât și continuarea sub chei nesigure?
2. Principii
2.1 Keys are role-bearing trust objects
Cheile MUST fi tratate ca obiecte de încredere asociate unor roluri și politici precise, nu doar ca bytes secrete.
2.2 Role identity and key material are related but not identical
Identitatea unui rol poate supraviețui unei rotații de cheie, dar această legătură MUST fi explicită, nu implicită.
2.3 Compromise handling must fail closed for critical scopes
Dacă există suspiciune serioasă sau confirmare de compromitere pentru chei critice, sistemul SHOULD favoriza fail-closed și revocare/quarantine controlată.
2.4 Rotation must be auditable
Orice rotație, revocare, înlocuire sau re-autorizare MUST lăsa obiecte și dovezi auditabile.
2.5 Emergency speed does not remove explicitness
Rotația de urgență MAY accelera pașii, dar MUST NOT elimina:
- clasificarea incidentului,
- obiectele de revocare/înlocuire,
- delimitarea de scope,
- și trasabilitatea.
2.6 Minimize unsafe overlap
Perioadele în care două chei par simultan active pentru același rol SHOULD fi minimizate și strict definite.
3. Key taxonomy
3.1 Recommended key classes
ATLAS ZERO SHOULD distinge cel puțin:
KEY_VALIDATOR_IDENTITYKEY_PROPOSERKEY_VERIFIERKEY_NOTARYKEY_WITNESS_ISSUERKEY_ORACLE_ISSUERKEY_GOVERNANCE_SIGNERKEY_RELEASE_SIGNERKEY_GENESIS_CUSTODIANKEY_ARCHIVE_ATTESTATIONKEY_OPERATOR_ADMIN_LOCALKEY_RECOVERY_OPERATOR
3.2 Meaning
KEY_VALIDATOR_IDENTITY
Leagă validatorul de identitatea protocolară.
KEY_PROPOSER / VERIFIER / NOTARY
Chei de rol în consens.
KEY_WITNESS_ISSUER / KEY_ORACLE_ISSUER
Chei pentru emiterea de witnessuri/claims în domenii controlate.
KEY_GOVERNANCE_SIGNER
Cheie pentru review, vot sau activare unde modelul o cere.
KEY_RELEASE_SIGNER / GENESIS_CUSTODIAN / ARCHIVE_ATTESTATION
Chei pentru artefacte și proveniență operațională.
KEY_OPERATOR_ADMIN_LOCAL
Cheie pentru control local de serviciu, fără autoritate protocolară directă.
KEY_RECOVERY_OPERATOR
Cheie pentru procese de recovery, dacă politica o permite.
3.3 Rule
Politica MUST define clar ce autoritate are fiecare clasă de cheie.
4. Key state model
4.1 Standard states
Cheile SHOULD trece prin stări precum:
KS_REGISTEREDKS_PENDING_ACTIVATIONKS_ACTIVEKS_GRACE_ACTIVEKS_QUARANTINEDKS_REVOKEDKS_SUPERSEDEDKS_EXPIREDKS_RECOVEREDKS_ARCHIVED
4.2 Meaning
KS_PENDING_ACTIVATION
Cheia a fost introdusă, dar încă nu are autoritate activă.
KS_GRACE_ACTIVE
Cheia veche sau nouă se află într-o fereastră controlată de tranziție, dacă politica o permite.
KS_QUARANTINED
Cheia nu mai este de încredere pentru operațiuni normale până la clarificare.
KS_REVOKED
Autoritatea a fost retrasă explicit.
KS_SUPERSEDED
Cheia a fost înlocuită de altă cheie validă.
4.3 Rule
Nicio cheie compromisă confirmat MUST NOT rămâne ACTIVE.
5. Role binding model
5.1 Need
Un rol trebuie legat de o cheie sau de un set de chei prin obiecte explicite.
5.2 Canonical structure
RoleKeyBinding {
binding_id
role_class
role_scope_hash
actor_identity_ref
key_ref
activation_boundary
expiration_boundary?
binding_status
}
5.3 binding_status
BINDING_PENDINGBINDING_ACTIVEBINDING_GRACEBINDING_REVOKEDBINDING_SUPERSEDED
5.4 Rule
Binding-ul de rol MUST fi obiectul de adevăr pentru activitatea unei chei în acel rol, nu simpla prezență locală a cheii.
6. Key registration object
6.1 Canonical structure
KeyRegistrationRecord {
version_major
version_minor
key_registration_id
key_class
public_key_ref
owner_identity_ref
policy_ref?
metadata_hash?
registration_time_unix_ms
attestation_refs?
}
6.2 Rule
A key SHOULD be registered before it can be activated in any critical role scope.
7. Rotation classes
7.1 Standard rotation classes
ROT_PLANNEDROT_SCHEDULED_EXPIRYROT_HYGIENEROT_INFRA_REFRESHROT_SUSPECTED_COMPROMISEROT_CONFIRMED_COMPROMISEROT_KEY_LOSSROT_POLICY_CHANGEROT_ROLE_SPLITROT_POST_INCIDENT_RECOVERY
7.2 Rule
Rotation class MUST be explicit because procedures and urgency differ materially.
8. Key compromise classification
8.1 Standard compromise levels
KC_NONEKC_SUSPECTEDKC_PROBABLEKC_CONFIRMEDKC_CONFIRMED_EXPLOITED
8.2 Meaning
KC_SUSPECTED
Semnale incomplete, dar suficiente pentru precauție sporită.
KC_PROBABLE
Indiciile sunt puternice, dar nu complet definitive.
KC_CONFIRMED
Compromiterea este confirmată.
KC_CONFIRMED_EXPLOITED
Cheia compromisă a fost deja folosită abuziv sau există dovadă de utilizare adversarială.
8.3 Rule
Escalation and revocation behavior SHOULD depend on compromise level and key criticality.
9. Key compromise object
9.1 Canonical structure
KeyCompromiseRecord {
version_major
version_minor
compromise_id
key_ref
key_class
compromise_level
affected_role_scopes_root
evidence_root?
detected_at_unix_ms
reporter_policy_ref?
status
}
9.2 status
OPENCONTAININGREVOKEDRECOVERINGRESOLVEDARCHIVED
9.3 Rule
Critical compromise records MUST be linkable to incident response and decision ledger.
10. Planned rotation flow
10.1 Canonical order
- register replacement key
- attest replacement key provenance if required
- create new pending role binding
- announce or publish operator/release guidance if needed
- activate new binding at boundary
- move old binding to grace or superseded
- revoke old binding at end of grace if used
- archive rotation evidence
10.2 Rule
Planned rotation SHOULD avoid ambiguous simultaneous active authority unless policy explicitly allows bounded overlap.
11. Emergency rotation flow
11.1 Canonical order
- classify compromise level
- quarantine risky role or disable signing immediately where needed
- issue compromise record
- revoke or quarantine old key binding
- activate replacement key under emergency path if available
- notify operators and relevant governance/security roles
- collect post-rotation evidence
- run recovery and stabilization checks
11.2 Rule
For critical roles, emergency rotation SHOULD prioritize safe stop over continuity if both cannot be preserved.
12. Grace periods
12.1 Need
Some rotations may need bounded overlap.
12.2 Rule
Grace periods MAY exist only if:
- the role permits it;
- overlap semantics are explicit;
- monitoring can distinguish which key is used;
- ambiguity and replay risks are controlled.
12.3 Rule
Compromised or probably compromised keys SHOULD NOT receive generous grace windows.
13. Key revocation object
13.1 Canonical structure
KeyRevocationRecord {
version_major
version_minor
revocation_id
key_ref
key_class
revoked_role_scopes_root
revocation_reason_class
effective_boundary
issuer_policy_ref
evidence_root?
signature_envelopes
}
13.2 revocation_reason_class examples
- routine_rotation
- key_expiry
- suspected_compromise
- confirmed_compromise
- confirmed_exploitation
- loss_of_control
- policy_violation
- role_decommissioned
13.3 Rule
Revocation MUST be explicit and scope-bound.
14. Key replacement object
14.1 Canonical structure
KeyReplacementRecord {
version_major
version_minor
replacement_id
old_key_ref
new_key_ref
role_scope_hash
replacement_class
activation_boundary
issuer_policy_ref
evidence_root?
signature_envelopes
}
14.2 replacement_class examples
- planned_rotation
- emergency_rotation
- split_role_rotation
- recovery_replacement
- forced_reissue
14.3 Rule
Replacement MUST clearly identify both old and new key and the exact role scope.
15. Role re-authorization object
15.1 Need
Uneori cheia nouă este cunoscută, dar rolul ei trebuie re-autorizat explicit.
15.2 Canonical structure
RoleReauthorizationRecord {
reauth_id
role_class
role_scope_hash
actor_identity_ref
new_key_ref
activation_boundary
attestation_root?
issuer_policy_ref
}
15.3 Rule
For critical roles, replacement of key SHOULD often require re-authorization, not just raw replacement record.
16. Key expiry handling
16.1 Rule
If a key class has expiry policy, expiry MUST be known before it becomes operational risk.
16.2 Operators SHOULD monitor:
- upcoming expiries
- grace windows
- delayed rotation blockers
- role impact of expiry
16.3 Rule
A key approaching expiry for critical role SHOULD trigger planned rotation before emergency behavior is needed.
17. Lost key handling
17.1 Key loss classes
- lost but believed uncompromised
- lost with unknown exposure
- lost and probably exposed
17.2 Rule
Unknown exposure SHOULD be treated conservatively, often similar to compromise until disproven.
17.3 Flow
Lost key handling SHOULD create:
- incident or key loss record
- revocation or quarantine as needed
- replacement path
- operator recovery checklist
18. Consensus role specifics
18.1 Proposer keys
Fast rotation MAY be possible, but MUST preserve no-ambiguity of proposer identity.
18.2 Verifier keys
Rotation MUST not create ambiguous vote validity across boundary.
18.3 Notary keys
Rotation SHOULD be strictest. Overlap or unclear activation for notary keys is dangerous and SHOULD be minimized or forbidden.
18.4 Rule
Consensus role key changes SHOULD prefer epoch or clearly defined boundary activation.
19. Witness / issuer key specifics
19.1 Witness/oracle issuer keys
Rotation MUST preserve:
- issuer identity continuity if intended;
- revocation and supersession semantics;
- clear cutoff for old issuer key acceptance.
19.2 Rule
Critical issuer families SHOULD have publishable active-key state so verifiers can reject stale issuer keys deterministically.
20. Governance signer specifics
20.1 Governance keys
Rotation MUST preserve legitimacy of voting/review/activation roles.
20.2 Rule
Governance key rotation SHOULD be recorded with stronger review where it affects high-authority signers.
20.3 Rule
Compromised governance signer keys SHOULD trigger constitutionally valid containment or review, not informal swap only.
21. Release / artifact signer specifics
21.1 Release and genesis custodian keys
Compromise here is highly sensitive because it can poison provenance.
21.2 Rule
Compromise or suspected compromise of release-signing keys SHOULD trigger:
- artifact trust review;
- possible revocation of affected approvals;
- possible quarantine of affected release artifacts;
- replacement and re-attestation workflow.
22. Local admin key specifics
22.1 Local admin keys
These may not define protocol truth directly, but compromise can still enable dangerous local behavior.
22.2 Rule
Compromise of local admin keys SHOULD trigger local safe mode, operator recovery and config integrity review.
22.3 Rule
Local admin key compromise MUST NOT be confused with protocol signer compromise, but can lead to it.
23. Key health monitoring
23.1 Operators SHOULD monitor:
- signer liveness
- signing failures
- unexpected key usage
- key nearing expiry
- duplicate or unexpected signatures
- unauthorized scope use
- hardware security module or signing service anomalies if used
23.2 Rule
Unexpected key usage in critical scope SHOULD be treated as compromise signal until explained.
24. Key usage evidence
24.1 The system SHOULD preserve evidence of:
- signatures produced
- role scope used
- boundary/time
- associated artifact/object ids
- verification results
24.2 Rule
Without key usage evidence, compromise diagnosis and attribution become weak.
25. Quarantine model for keys
25.1 A key MAY be quarantined when:
- compromise suspected
- signer behavior abnormal
- role binding ambiguous
- operator environment unstable
- duplicate unexpected usage observed
25.2 Effects
Key quarantine MAY imply:
- disable signing
- reject new signatures from key
- require validation-only or observer-only mode
- open incident
- prevent rejoin until review complete
25.3 Rule
Quarantine is stronger than caution, weaker than final revocation.
26. Recovery after compromise
26.1 Recovery SHOULD include:
- preserve evidence
- disable compromised or suspect key use
- issue compromise and/or revocation records
- activate or prepare replacement key
- run local integrity checks
- rerun operator preflight / restart / rejoin procedure
- monitor post-recovery behavior
- archive full audit trail
26.2 Rule
Recovery MUST NOT restore original unsafe key out of convenience unless compromise claim was clearly invalidated and policy permits.
27. Restart and rejoin after key rotation
27.1 After key change, operator SHOULD:
- verify new binding active
- verify old binding revoked or in allowed grace
- verify local config points only to intended key
- run preflight
- enter validation-only first if critical role
- re-enable signing in controlled order
27.2 Rule
Post-rotation rejoin SHOULD be conservative for consensus-signing roles.
28. Key rotation ceremony for critical scopes
28.1 For high-impact keys, SHOULD exist a small formal ceremony including:
- replacement key confirmation
- binding confirmation
- old key revocation confirmation
- scope confirmation
- operator readiness confirmation
28.2 Rule
This is especially recommended for:
- notary keys
- governance signers
- release signers
- genesis custodians
- recovery operators with high authority
29. Decision ledger interaction
29.1 Key events SHOULD feed decision ledger through decisions such as:
- hold role activation
- quarantine scope
- restart approved
- rejoin approved
- restricted posture maintained
- proceed after rotation
29.2 Rule
Critical key events MUST NOT remain only as local ops notes.
30. Incident response interaction
30.1 Key compromise or suspected compromise SHOULD integrate with incident flow:
- open incident if severity justifies
- preserve evidence
- classify blast radius
- choose containment
- record recovery and closure
30.2 Rule
Confirmed exploitation of critical key SHOULD almost always be incident-grade.
31. Upgrade and fork interaction
31.1 Upgrades and forks MAY require coordinated key rotation:
- new governance signers
- new release signers
- new validator role keys
- domain separation changes
- new chain identity boundaries
31.2 Rule
Hard fork plans SHOULD explicitly say whether old keys remain valid on old or new chain and how replay/confusion is prevented.
32. Archive and audit interaction
32.1 Audit archives SHOULD preserve:
- registration records
- binding records
- compromise records
- revocation records
- replacement records
- recovery checklists
- decision records linked to key events
- relevant monitoring anomalies
32.2 Rule
Critical key history MUST remain reconstructible.
33. Key policy matrix
33.1 The system SHOULD maintain a matrix specifying for each key class:
- max active overlap
- expiry policy
- rotation minimum cadence
- emergency rotation path
- quarantine policy
- revocation authority
- reauthorization authority
- monitoring requirements
33.2 Rule
Key handling MUST be policy-driven, not artisanal.
34. Anti-patterns
Systems SHOULD avoid:
- same hot key reused across many critical roles without explicit policy
- rotation by local config change only with no protocol record
- long ambiguous overlap of old and new critical keys
- treating suspected compromise as harmless until perfect proof
- emergency rotation with no audit trail
- local admin key compromise ignored because “not protocol key”
- release signer compromise without artifact trust review
- governance signer swap without constitutional path
- restart directly into signing after key incident
- deleting old key records after replacement
35. Formal goals
AZ-035 urmărește aceste obiective:
35.1 Key lifecycle clarity
The system knows exactly when a key is registered, active, replaced, quarantined, revoked or archived.
35.2 Compromise containment
Critical key compromise can be contained quickly enough to matter.
35.3 Safe role continuity
Roles can survive key changes without ambiguous authority.
35.4 Audit-grade key history
A future reviewer can reconstruct the exact history of key events and resulting operational decisions.
36. Formula documentului
Key Rotation / Compromise / Recovery = explicit key classes + role bindings + planned/emergency rotation flows + revocation/replacement records + monitored usage + recovery with fail-closed bias
37. Relația cu restul suitei
- AZ-015 definește incident response.
- AZ-025 și AZ-029 definesc operator discipline.
- AZ-030 definește decision ledger.
- AZ-034 definește upgrade și fork control.
- AZ-035 definește disciplina cheilor care susține toate aceste tranziții și recuperări.
Pe scurt: AZ-035 este infrastructura de încredere operațională pentru identitățile active ale rețelei vii.
38. Ce urmează
După AZ-035, documentul corect este:
AZ-036 — Network Upgrade Rollout and Version Compatibility Matrix
Acolo trebuie fixate:
- matricea concretă de compatibilitate între versiuni;
- pașii de rollout pe rețea;
- mixed-fleet behavior;
- gating pentru semnare și validare;
- și tranziția controlată dintre versiuni înainte, în timpul și după activare.
Închidere
Un protocol poate tolera multe lucruri. Dar dacă nu știe exact ce chei sunt valide, cine a înlocuit pe cine și când o cheie devine prea riscantă ca să mai semneze, tot restul disciplinei începe să se rupă.
Acolo începe igiena reală a identității operaționale: nu când ai multe chei, ci când știi exact cum le naști, cum le rotești, cum le oprești și cum supraviețuiești când una cade.