ATLAS ZERO VM.zip / AZ-035_Key_Rotation_Key_Compromise_and_Recovery_Protocol_v1.md

AZ-035 — Key Rotation, Key Compromise and Recovery Protocol v1

AZ-035 — Key Rotation, Key Compromise and Recovery Protocol v1

Status

Acest document definește protocolul pentru:

  • clasificarea cheilor;
  • rotația normală a cheilor;
  • rotația de urgență;
  • compromiterea de chei;
  • revocarea, înlocuirea și re-autorizarea rolurilor;
  • recovery după incidente legate de chei.

După AZ-001 până la AZ-034, există deja:

  • specificația protocolului și a subsistemelor lui;
  • modelul de securitate;
  • guvernanța, incident response și recovery;
  • launch discipline, monitoring, review și archive;
  • protocolul de upgrade și hard fork.

AZ-035 răspunde la întrebarea: cum administrăm ciclul de viață al cheilor și al autorizărilor asociate lor astfel încât compromiterea, expirarea, rotația sau pierderea cheilor să nu producă haos operațional, confuzie de rol sau continuarea sub identități nesigure?

Scopul documentului este să fixeze:

  • clasele de chei și scopurile lor;
  • regulile de rotație planificată;
  • rotația de urgență după suspiciune sau confirmare de compromis;
  • revocarea și re-autorizarea rolurilor;
  • efectele asupra consensului, witness, guvernanță, launch și recovery;
  • obiectele canonice și evidența auditabilă pentru toate aceste tranziții.

Acest document se bazează pe:

  • AZ-002 până la AZ-034, cu accent direct pe AZ-006, AZ-009, AZ-010, AZ-015, AZ-020, AZ-025, AZ-030 și AZ-034.

Termeni:

  • MUST = obligatoriu
  • MUST NOT = interzis
  • SHOULD = recomandat puternic
  • MAY = opțional

1. Obiectiv

AZ-035 răspunde la 10 întrebări critice:

  1. Ce tipuri de chei există în ecosistemul ATLAS ZERO?
  2. Ce înseamnă rotație normală versus rotație de urgență?
  3. Ce înseamnă compromitere suspectată versus compromitere confirmată?
  4. Cum se revocă o cheie sau un rol?
  5. Cum se introduce o cheie nouă fără ambiguitate?
  6. Cum sunt afectate rolurile de proposer, verifier, notary, issuer și governance signer?
  7. Ce se întâmplă dacă o cheie este pierdută, coruptă sau folosită abuziv?
  8. Cum se face recovery operațional și protocolar după compromitere?
  9. Ce obiecte, manifesturi și atestări păstrează audit trail-ul acestor schimbări?
  10. Cum evităm atât downtime inutil, cât și continuarea sub chei nesigure?

2. Principii

2.1 Keys are role-bearing trust objects

Cheile MUST fi tratate ca obiecte de încredere asociate unor roluri și politici precise, nu doar ca bytes secrete.

2.2 Role identity and key material are related but not identical

Identitatea unui rol poate supraviețui unei rotații de cheie, dar această legătură MUST fi explicită, nu implicită.

2.3 Compromise handling must fail closed for critical scopes

Dacă există suspiciune serioasă sau confirmare de compromitere pentru chei critice, sistemul SHOULD favoriza fail-closed și revocare/quarantine controlată.

2.4 Rotation must be auditable

Orice rotație, revocare, înlocuire sau re-autorizare MUST lăsa obiecte și dovezi auditabile.

2.5 Emergency speed does not remove explicitness

Rotația de urgență MAY accelera pașii, dar MUST NOT elimina:

  • clasificarea incidentului,
  • obiectele de revocare/înlocuire,
  • delimitarea de scope,
  • și trasabilitatea.

2.6 Minimize unsafe overlap

Perioadele în care două chei par simultan active pentru același rol SHOULD fi minimizate și strict definite.


3. Key taxonomy

3.1 Recommended key classes

ATLAS ZERO SHOULD distinge cel puțin:

  • KEY_VALIDATOR_IDENTITY
  • KEY_PROPOSER
  • KEY_VERIFIER
  • KEY_NOTARY
  • KEY_WITNESS_ISSUER
  • KEY_ORACLE_ISSUER
  • KEY_GOVERNANCE_SIGNER
  • KEY_RELEASE_SIGNER
  • KEY_GENESIS_CUSTODIAN
  • KEY_ARCHIVE_ATTESTATION
  • KEY_OPERATOR_ADMIN_LOCAL
  • KEY_RECOVERY_OPERATOR

3.2 Meaning

KEY_VALIDATOR_IDENTITY

Leagă validatorul de identitatea protocolară.

KEY_PROPOSER / VERIFIER / NOTARY

Chei de rol în consens.

KEY_WITNESS_ISSUER / KEY_ORACLE_ISSUER

Chei pentru emiterea de witnessuri/claims în domenii controlate.

KEY_GOVERNANCE_SIGNER

Cheie pentru review, vot sau activare unde modelul o cere.

KEY_RELEASE_SIGNER / GENESIS_CUSTODIAN / ARCHIVE_ATTESTATION

Chei pentru artefacte și proveniență operațională.

KEY_OPERATOR_ADMIN_LOCAL

Cheie pentru control local de serviciu, fără autoritate protocolară directă.

KEY_RECOVERY_OPERATOR

Cheie pentru procese de recovery, dacă politica o permite.

3.3 Rule

Politica MUST define clar ce autoritate are fiecare clasă de cheie.


4. Key state model

4.1 Standard states

Cheile SHOULD trece prin stări precum:

  • KS_REGISTERED
  • KS_PENDING_ACTIVATION
  • KS_ACTIVE
  • KS_GRACE_ACTIVE
  • KS_QUARANTINED
  • KS_REVOKED
  • KS_SUPERSEDED
  • KS_EXPIRED
  • KS_RECOVERED
  • KS_ARCHIVED

4.2 Meaning

KS_PENDING_ACTIVATION

Cheia a fost introdusă, dar încă nu are autoritate activă.

KS_GRACE_ACTIVE

Cheia veche sau nouă se află într-o fereastră controlată de tranziție, dacă politica o permite.

KS_QUARANTINED

Cheia nu mai este de încredere pentru operațiuni normale până la clarificare.

KS_REVOKED

Autoritatea a fost retrasă explicit.

KS_SUPERSEDED

Cheia a fost înlocuită de altă cheie validă.

4.3 Rule

Nicio cheie compromisă confirmat MUST NOT rămâne ACTIVE.


5. Role binding model

5.1 Need

Un rol trebuie legat de o cheie sau de un set de chei prin obiecte explicite.

5.2 Canonical structure

RoleKeyBinding {
  binding_id
  role_class
  role_scope_hash
  actor_identity_ref
  key_ref
  activation_boundary
  expiration_boundary?
  binding_status
}

5.3 binding_status

  • BINDING_PENDING
  • BINDING_ACTIVE
  • BINDING_GRACE
  • BINDING_REVOKED
  • BINDING_SUPERSEDED

5.4 Rule

Binding-ul de rol MUST fi obiectul de adevăr pentru activitatea unei chei în acel rol, nu simpla prezență locală a cheii.


6. Key registration object

6.1 Canonical structure

KeyRegistrationRecord {
  version_major
  version_minor

  key_registration_id
  key_class
  public_key_ref
  owner_identity_ref
  policy_ref?
  metadata_hash?
  registration_time_unix_ms
  attestation_refs?
}

6.2 Rule

A key SHOULD be registered before it can be activated in any critical role scope.


7. Rotation classes

7.1 Standard rotation classes

  • ROT_PLANNED
  • ROT_SCHEDULED_EXPIRY
  • ROT_HYGIENE
  • ROT_INFRA_REFRESH
  • ROT_SUSPECTED_COMPROMISE
  • ROT_CONFIRMED_COMPROMISE
  • ROT_KEY_LOSS
  • ROT_POLICY_CHANGE
  • ROT_ROLE_SPLIT
  • ROT_POST_INCIDENT_RECOVERY

7.2 Rule

Rotation class MUST be explicit because procedures and urgency differ materially.


8. Key compromise classification

8.1 Standard compromise levels

  • KC_NONE
  • KC_SUSPECTED
  • KC_PROBABLE
  • KC_CONFIRMED
  • KC_CONFIRMED_EXPLOITED

8.2 Meaning

KC_SUSPECTED

Semnale incomplete, dar suficiente pentru precauție sporită.

KC_PROBABLE

Indiciile sunt puternice, dar nu complet definitive.

KC_CONFIRMED

Compromiterea este confirmată.

KC_CONFIRMED_EXPLOITED

Cheia compromisă a fost deja folosită abuziv sau există dovadă de utilizare adversarială.

8.3 Rule

Escalation and revocation behavior SHOULD depend on compromise level and key criticality.


9. Key compromise object

9.1 Canonical structure

KeyCompromiseRecord {
  version_major
  version_minor

  compromise_id
  key_ref
  key_class
  compromise_level
  affected_role_scopes_root
  evidence_root?
  detected_at_unix_ms
  reporter_policy_ref?
  status
}

9.2 status

  • OPEN
  • CONTAINING
  • REVOKED
  • RECOVERING
  • RESOLVED
  • ARCHIVED

9.3 Rule

Critical compromise records MUST be linkable to incident response and decision ledger.


10. Planned rotation flow

10.1 Canonical order

  1. register replacement key
  2. attest replacement key provenance if required
  3. create new pending role binding
  4. announce or publish operator/release guidance if needed
  5. activate new binding at boundary
  6. move old binding to grace or superseded
  7. revoke old binding at end of grace if used
  8. archive rotation evidence

10.2 Rule

Planned rotation SHOULD avoid ambiguous simultaneous active authority unless policy explicitly allows bounded overlap.


11. Emergency rotation flow

11.1 Canonical order

  1. classify compromise level
  2. quarantine risky role or disable signing immediately where needed
  3. issue compromise record
  4. revoke or quarantine old key binding
  5. activate replacement key under emergency path if available
  6. notify operators and relevant governance/security roles
  7. collect post-rotation evidence
  8. run recovery and stabilization checks

11.2 Rule

For critical roles, emergency rotation SHOULD prioritize safe stop over continuity if both cannot be preserved.


12. Grace periods

12.1 Need

Some rotations may need bounded overlap.

12.2 Rule

Grace periods MAY exist only if:

  • the role permits it;
  • overlap semantics are explicit;
  • monitoring can distinguish which key is used;
  • ambiguity and replay risks are controlled.

12.3 Rule

Compromised or probably compromised keys SHOULD NOT receive generous grace windows.


13. Key revocation object

13.1 Canonical structure

KeyRevocationRecord {
  version_major
  version_minor

  revocation_id
  key_ref
  key_class
  revoked_role_scopes_root
  revocation_reason_class
  effective_boundary
  issuer_policy_ref
  evidence_root?
  signature_envelopes
}

13.2 revocation_reason_class examples

  • routine_rotation
  • key_expiry
  • suspected_compromise
  • confirmed_compromise
  • confirmed_exploitation
  • loss_of_control
  • policy_violation
  • role_decommissioned

13.3 Rule

Revocation MUST be explicit and scope-bound.


14. Key replacement object

14.1 Canonical structure

KeyReplacementRecord {
  version_major
  version_minor

  replacement_id
  old_key_ref
  new_key_ref
  role_scope_hash
  replacement_class
  activation_boundary
  issuer_policy_ref
  evidence_root?
  signature_envelopes
}

14.2 replacement_class examples

  • planned_rotation
  • emergency_rotation
  • split_role_rotation
  • recovery_replacement
  • forced_reissue

14.3 Rule

Replacement MUST clearly identify both old and new key and the exact role scope.


15. Role re-authorization object

15.1 Need

Uneori cheia nouă este cunoscută, dar rolul ei trebuie re-autorizat explicit.

15.2 Canonical structure

RoleReauthorizationRecord {
  reauth_id
  role_class
  role_scope_hash
  actor_identity_ref
  new_key_ref
  activation_boundary
  attestation_root?
  issuer_policy_ref
}

15.3 Rule

For critical roles, replacement of key SHOULD often require re-authorization, not just raw replacement record.


16. Key expiry handling

16.1 Rule

If a key class has expiry policy, expiry MUST be known before it becomes operational risk.

16.2 Operators SHOULD monitor:

  • upcoming expiries
  • grace windows
  • delayed rotation blockers
  • role impact of expiry

16.3 Rule

A key approaching expiry for critical role SHOULD trigger planned rotation before emergency behavior is needed.


17. Lost key handling

17.1 Key loss classes

  • lost but believed uncompromised
  • lost with unknown exposure
  • lost and probably exposed

17.2 Rule

Unknown exposure SHOULD be treated conservatively, often similar to compromise until disproven.

17.3 Flow

Lost key handling SHOULD create:

  • incident or key loss record
  • revocation or quarantine as needed
  • replacement path
  • operator recovery checklist

18. Consensus role specifics

18.1 Proposer keys

Fast rotation MAY be possible, but MUST preserve no-ambiguity of proposer identity.

18.2 Verifier keys

Rotation MUST not create ambiguous vote validity across boundary.

18.3 Notary keys

Rotation SHOULD be strictest. Overlap or unclear activation for notary keys is dangerous and SHOULD be minimized or forbidden.

18.4 Rule

Consensus role key changes SHOULD prefer epoch or clearly defined boundary activation.


19. Witness / issuer key specifics

19.1 Witness/oracle issuer keys

Rotation MUST preserve:

  • issuer identity continuity if intended;
  • revocation and supersession semantics;
  • clear cutoff for old issuer key acceptance.

19.2 Rule

Critical issuer families SHOULD have publishable active-key state so verifiers can reject stale issuer keys deterministically.


20. Governance signer specifics

20.1 Governance keys

Rotation MUST preserve legitimacy of voting/review/activation roles.

20.2 Rule

Governance key rotation SHOULD be recorded with stronger review where it affects high-authority signers.

20.3 Rule

Compromised governance signer keys SHOULD trigger constitutionally valid containment or review, not informal swap only.


21. Release / artifact signer specifics

21.1 Release and genesis custodian keys

Compromise here is highly sensitive because it can poison provenance.

21.2 Rule

Compromise or suspected compromise of release-signing keys SHOULD trigger:

  • artifact trust review;
  • possible revocation of affected approvals;
  • possible quarantine of affected release artifacts;
  • replacement and re-attestation workflow.

22. Local admin key specifics

22.1 Local admin keys

These may not define protocol truth directly, but compromise can still enable dangerous local behavior.

22.2 Rule

Compromise of local admin keys SHOULD trigger local safe mode, operator recovery and config integrity review.

22.3 Rule

Local admin key compromise MUST NOT be confused with protocol signer compromise, but can lead to it.


23. Key health monitoring

23.1 Operators SHOULD monitor:

  • signer liveness
  • signing failures
  • unexpected key usage
  • key nearing expiry
  • duplicate or unexpected signatures
  • unauthorized scope use
  • hardware security module or signing service anomalies if used

23.2 Rule

Unexpected key usage in critical scope SHOULD be treated as compromise signal until explained.


24. Key usage evidence

24.1 The system SHOULD preserve evidence of:

  • signatures produced
  • role scope used
  • boundary/time
  • associated artifact/object ids
  • verification results

24.2 Rule

Without key usage evidence, compromise diagnosis and attribution become weak.


25. Quarantine model for keys

25.1 A key MAY be quarantined when:

  • compromise suspected
  • signer behavior abnormal
  • role binding ambiguous
  • operator environment unstable
  • duplicate unexpected usage observed

25.2 Effects

Key quarantine MAY imply:

  • disable signing
  • reject new signatures from key
  • require validation-only or observer-only mode
  • open incident
  • prevent rejoin until review complete

25.3 Rule

Quarantine is stronger than caution, weaker than final revocation.


26. Recovery after compromise

26.1 Recovery SHOULD include:

  1. preserve evidence
  2. disable compromised or suspect key use
  3. issue compromise and/or revocation records
  4. activate or prepare replacement key
  5. run local integrity checks
  6. rerun operator preflight / restart / rejoin procedure
  7. monitor post-recovery behavior
  8. archive full audit trail

26.2 Rule

Recovery MUST NOT restore original unsafe key out of convenience unless compromise claim was clearly invalidated and policy permits.


27. Restart and rejoin after key rotation

27.1 After key change, operator SHOULD:

  • verify new binding active
  • verify old binding revoked or in allowed grace
  • verify local config points only to intended key
  • run preflight
  • enter validation-only first if critical role
  • re-enable signing in controlled order

27.2 Rule

Post-rotation rejoin SHOULD be conservative for consensus-signing roles.


28. Key rotation ceremony for critical scopes

28.1 For high-impact keys, SHOULD exist a small formal ceremony including:

  • replacement key confirmation
  • binding confirmation
  • old key revocation confirmation
  • scope confirmation
  • operator readiness confirmation

28.2 Rule

This is especially recommended for:

  • notary keys
  • governance signers
  • release signers
  • genesis custodians
  • recovery operators with high authority

29. Decision ledger interaction

29.1 Key events SHOULD feed decision ledger through decisions such as:

  • hold role activation
  • quarantine scope
  • restart approved
  • rejoin approved
  • restricted posture maintained
  • proceed after rotation

29.2 Rule

Critical key events MUST NOT remain only as local ops notes.


30. Incident response interaction

30.1 Key compromise or suspected compromise SHOULD integrate with incident flow:

  • open incident if severity justifies
  • preserve evidence
  • classify blast radius
  • choose containment
  • record recovery and closure

30.2 Rule

Confirmed exploitation of critical key SHOULD almost always be incident-grade.


31. Upgrade and fork interaction

31.1 Upgrades and forks MAY require coordinated key rotation:

  • new governance signers
  • new release signers
  • new validator role keys
  • domain separation changes
  • new chain identity boundaries

31.2 Rule

Hard fork plans SHOULD explicitly say whether old keys remain valid on old or new chain and how replay/confusion is prevented.


32. Archive and audit interaction

32.1 Audit archives SHOULD preserve:

  • registration records
  • binding records
  • compromise records
  • revocation records
  • replacement records
  • recovery checklists
  • decision records linked to key events
  • relevant monitoring anomalies

32.2 Rule

Critical key history MUST remain reconstructible.


33. Key policy matrix

33.1 The system SHOULD maintain a matrix specifying for each key class:

  • max active overlap
  • expiry policy
  • rotation minimum cadence
  • emergency rotation path
  • quarantine policy
  • revocation authority
  • reauthorization authority
  • monitoring requirements

33.2 Rule

Key handling MUST be policy-driven, not artisanal.


34. Anti-patterns

Systems SHOULD avoid:

  1. same hot key reused across many critical roles without explicit policy
  2. rotation by local config change only with no protocol record
  3. long ambiguous overlap of old and new critical keys
  4. treating suspected compromise as harmless until perfect proof
  5. emergency rotation with no audit trail
  6. local admin key compromise ignored because “not protocol key”
  7. release signer compromise without artifact trust review
  8. governance signer swap without constitutional path
  9. restart directly into signing after key incident
  10. deleting old key records after replacement

35. Formal goals

AZ-035 urmărește aceste obiective:

35.1 Key lifecycle clarity

The system knows exactly when a key is registered, active, replaced, quarantined, revoked or archived.

35.2 Compromise containment

Critical key compromise can be contained quickly enough to matter.

35.3 Safe role continuity

Roles can survive key changes without ambiguous authority.

35.4 Audit-grade key history

A future reviewer can reconstruct the exact history of key events and resulting operational decisions.


36. Formula documentului

Key Rotation / Compromise / Recovery = explicit key classes + role bindings + planned/emergency rotation flows + revocation/replacement records + monitored usage + recovery with fail-closed bias


37. Relația cu restul suitei

  • AZ-015 definește incident response.
  • AZ-025 și AZ-029 definesc operator discipline.
  • AZ-030 definește decision ledger.
  • AZ-034 definește upgrade și fork control.
  • AZ-035 definește disciplina cheilor care susține toate aceste tranziții și recuperări.

Pe scurt: AZ-035 este infrastructura de încredere operațională pentru identitățile active ale rețelei vii.


38. Ce urmează

După AZ-035, documentul corect este:

AZ-036 — Network Upgrade Rollout and Version Compatibility Matrix

Acolo trebuie fixate:

  • matricea concretă de compatibilitate între versiuni;
  • pașii de rollout pe rețea;
  • mixed-fleet behavior;
  • gating pentru semnare și validare;
  • și tranziția controlată dintre versiuni înainte, în timpul și după activare.

Închidere

Un protocol poate tolera multe lucruri. Dar dacă nu știe exact ce chei sunt valide, cine a înlocuit pe cine și când o cheie devine prea riscantă ca să mai semneze, tot restul disciplinei începe să se rupă.

Acolo începe igiena reală a identității operaționale: nu când ai multe chei, ci când știi exact cum le naști, cum le rotești, cum le oprești și cum supraviețuiești când una cade.