ATLAS ZERO VM.zip / AZ-029_Concrete_Operator_Checklists_v1.md

AZ-029 — Concrete Operator Checklists v1

AZ-029 — Concrete Operator Checklists v1

Status

Acest document transformă manualul operațional și procedura de launch window în checklist-uri concrete, executabile și auditabile pentru operatorii ATLAS ZERO.

AZ-025 a definit manualul validatorilor și operatorilor. AZ-028 a definit procedura din fereastra reală de lansare. AZ-029 răspunde la întrebarea: ce listă exactă, scurtă și operațională trebuie să urmeze operatorul în fiecare fază critică pentru a reduce eroarea umană și a păstra auditabilitatea?

Scopul documentului este să fixeze:

  • checklist-uri separate pe faze;
  • checklist-uri pe roluri;
  • criterii de pass/fail/hold;
  • câmpuri minime de semnare și timestamp;
  • reguli de păstrare a evidenței;
  • relația dintre checklist și artefactele canonice.

Acest document se bazează pe:

  • AZ-002 până la AZ-028, cu accent direct pe AZ-015, AZ-017, AZ-022, AZ-025, AZ-026 și AZ-028.

Termeni:

  • MUST = obligatoriu
  • MUST NOT = interzis
  • SHOULD = recomandat puternic
  • MAY = opțional

1. Obiectiv

AZ-029 răspunde la 10 întrebări practice:

  1. Ce checklist-uri minime are nevoie un operator?
  2. Ce se verifică înainte de launch?
  3. Ce se verifică exact în launch window?
  4. Ce se verifică după bootstrap și în primele epoci?
  5. Ce checklist separat au proposer, verifier și notary?
  6. Cum se marchează pass, fail, hold și escalation?
  7. Cum se semnează sau se arhivează checklist-urile?
  8. Cum se evită bifarea formală fără verificare reală?
  9. Cum se leagă checklist-ul de artefacte, hash-uri și roluri?
  10. Cum se folosesc checklist-urile în incident, restart și rejoin?

2. Principii

2.1 Checklists are execution tools, not decorative docs

Checklist-urile MUST fi suficient de scurte și clare încât să poată fi urmate în timp real.

2.2 Every line must map to a real verification

Un item de checklist MUST corespunde unei verificări reale, nu unei impresii.

2.3 Pass/fail must be explicit

Fiecare item SHOULD avea rezultat explicit:

  • pass
  • fail
  • hold
  • n/a

2.4 Checklist truth must bind to exact scope

Checklist-urile critice MUST ancora:

  • release package id
  • genesis package id
  • genesis_hash
  • chain_id
  • role scope
  • timestamp
  • operator identity

2.5 Failed checklist item is a signal, not embarrassment

Un fail trebuie tratat ca semnal operațional util. Ascunderea lui este un defect mai grav decât fail-ul însuși.


3. Checklist families

3.1 Core families

ATLAS ZERO SHOULD suporta cel puțin aceste familii:

  1. CL_PRELAUNCH_ARTIFACTS
  2. CL_PRELAUNCH_ENVIRONMENT
  3. CL_PRELAUNCH_KEYS_AND_ROLES
  4. CL_PREFLIGHT_NODE
  5. CL_LAUNCH_WINDOW_FINAL_CHECKS
  6. CL_BOOTSTRAP_START
  7. CL_FIRST_BLOCKS
  8. CL_FIRST_EPOCHS
  9. CL_RESTRICTED_POSTURE
  10. CL_INCIDENT_LOCAL_RESPONSE
  11. CL_RESTART_AND_REJOIN
  12. CL_POST_LAUNCH_NORMALIZATION

3.2 Role-specific families

Additionally SHOULD exist:

  • CL_PROPOSER_ROLE
  • CL_VERIFIER_ROLE
  • CL_NOTARY_ROLE
  • CL_OBSERVER_ROLE

4. Checklist object model

4.1 Canonical structure

OperatorChecklistRecord {
  version_major
  version_minor

  checklist_record_id
  checklist_class
  checklist_template_version
  operator_identity_ref
  node_identity_ref?
  role_scope
  target_network_class
  target_chain_id
  target_genesis_hash
  target_release_package_id
  target_genesis_package_id
  started_at_unix_ms
  completed_at_unix_ms?
  overall_verdict
  item_root
  notes_hash?
}

4.2 overall_verdict

  • PASS
  • PASS_WITH_NOTES
  • HOLD
  • FAIL
  • ABORTED
  • INCOMPLETE

4.3 Rule

Checklist-urile critice SHOULD fi tratate ca obiecte auditabile, nu doar texte bifate local.


5. Checklist item model

5.1 Canonical structure

ChecklistItemResult {
  item_id
  item_class
  prompt_hash
  result
  evidence_refs?
  timestamp_unix_ms
  operator_note_hash?
}

5.2 result

  • PASS
  • FAIL
  • HOLD
  • N_A

5.3 Rule

PASS SHOULD mean actual verification completed. It MUST NOT mean “presumed OK”.


6. Evidence reference policy

6.1 Every critical item SHOULD link to evidence where possible:

  • artifact ids
  • hashes
  • preflight result id
  • live status record id
  • log refs
  • state root refs
  • checkpoint record ids

6.2 Rule

For TIER_3/TIER_4 operational checkpoints, evidence-less pass SHOULD be discouraged or forbidden by policy.


7. Template versioning

7.1 Need

Checklist-urile evoluează.

7.2 Rule

Fiecare checklist MUST reference:

  • checklist template class
  • template version

7.3 Rule

Template changes near launch SHOULD be tightly controlled and reviewed.


8. Minimal operator identification

8.1 Checklist records SHOULD include:

  • operator identity ref
  • node identity ref if applicable
  • role scope
  • environment or cluster label if applicable

8.2 Rule

Un checklist critic fără operator și rol identificabil are valoare audit scăzută.


9. Prelaunch artifact checklist

9.1 Class

CL_PRELAUNCH_ARTIFACTS

9.2 Purpose

Verify exact launch artifacts before any node role activation.

9.3 Minimum items

  1. release package manifest loaded
  2. release package id matches expected scope
  3. local binary hash matches approved artifact
  4. genesis package manifest loaded
  5. genesis package id matches expected scope
  6. recomputed genesis_hash matches expected value
  7. recomputed chain_id matches expected value
  8. derived roots commitment verified
  9. attestation sufficiency verified
  10. release/genesis compatibility verified

9.4 Hold/fail guidance

Any mismatch on binary hash, genesis_hash, chain_id or package id => FAIL.


10. Prelaunch environment checklist

10.1 Class

CL_PRELAUNCH_ENVIRONMENT

10.2 Minimum items

  1. host identity confirmed
  2. storage paths writable
  3. sufficient free disk space
  4. logging sink reachable
  5. metrics sink reachable
  6. snapshot path available
  7. local clock within tolerated drift
  8. network/firewall configuration loaded
  9. resource limits configured
  10. no undeclared config overrides present

10.3 Rule

Clock drift or broken persistent storage SHOULD block consensus-role launch.


11. Prelaunch keys and roles checklist

11.1 Class

CL_PRELAUNCH_KEYS_AND_ROLES

11.2 Minimum items

  1. validator identity key/path verified
  2. proposer key mapping verified if proposer active
  3. verifier key mapping verified if verifier active
  4. notary key mapping verified if notary active
  5. signer process reachable
  6. wrong-network key mix-up ruled out
  7. role enablement list explicit
  8. disabled roles explicit
  9. emergency local stop path verified
  10. key rotation/recovery reference available

11.3 Rule

Incorrect role-key mapping => FAIL, not HOLD.


12. Preflight node checklist

12.1 Class

CL_PREFLIGHT_NODE

12.2 Minimum items

  1. preflight executed
  2. preflight verdict captured
  3. artifact verification stage passed
  4. environment verification stage passed
  5. config semantic validation passed
  6. local state store healthy
  7. telemetry initialized
  8. role activation policy loaded
  9. peer bootstrap config loaded
  10. preflight warnings reviewed

12.3 Rule

Consensus roles SHOULD NOT activate unless preflight verdict is acceptable under launch policy.


13. Launch window final checks checklist

13.1 Class

CL_LAUNCH_WINDOW_FINAL_CHECKS

13.2 Minimum items

  1. freeze confirmed for release artifacts
  2. freeze confirmed for genesis artifacts
  3. no active revocation on critical artifacts
  4. latest advisory review complete
  5. operator readiness still current
  6. monitoring live at launch sensitivity
  7. incident path staffed and reachable
  8. no new critical blocker opened
  9. launch window scope reconfirmed
  10. hold/abort decision path ready

13.3 Rule

Any newly opened critical blocker SHOULD force HOLD or FAIL.


14. Bootstrap start checklist

14.1 Class

CL_BOOTSTRAP_START

14.2 Minimum items

  1. launch authorization record seen
  2. bootstrap instruction seen
  3. exact launch scope matches local node scope
  4. validation-only startup path selected first
  5. network connections established
  6. peer compatibility checks pass
  7. local genesis anchor matches live network view
  8. role activation order understood
  9. logs and metrics recording bootstrap
  10. evidence preservation active

14.3 Rule

Without launch authorization or bootstrap instruction, bootstrap start SHOULD NOT proceed.


15. First blocks checklist

15.1 Class

CL_FIRST_BLOCKS

15.2 Minimum items

  1. first peer compatibility looks healthy
  2. first candidate/block acceptance patterns normal enough
  3. no immediate invalid object spike
  4. no immediate chain_id/genesis mismatch seen
  5. proposer behavior normal if proposer active
  6. verifier behavior normal if verifier active
  7. notary signing not enabled prematurely
  8. logs captured for first block window
  9. anomaly thresholds armed
  10. incident escalation path ready if triggered

15.3 Rule

Unexpected early anomaly in first blocks SHOULD be recorded, not only observed informally.


16. First epochs checklist

16.1 Class

CL_FIRST_EPOCHS

16.2 Minimum items

  1. first finalized epoch observed
  2. finalized root recorded
  3. finality cadence within acceptable band
  4. validator participation acceptable
  5. no unexplained deterministic mismatch reported locally
  6. no no-finality escalation threshold crossed
  7. no BVM critical anomaly seen in launch subset
  8. no witness/proof critical anomaly seen in launch subset
  9. no governance activation anomaly seen
  10. early observation record emitted or logged

16.3 Rule

Failure to observe healthy first finalized epochs SHOULD delay normalization and may trigger incident workflow.


17. Restricted posture checklist

17.1 Class

CL_RESTRICTED_POSTURE

17.2 Minimum items

  1. restricted posture entry acknowledged
  2. config freeze maintained
  3. binary freeze maintained
  4. increased snapshot cadence enabled
  5. higher-sensitivity alerts enabled
  6. restart approvals tightened
  7. optional deferred features still disabled
  8. anomaly review cadence active
  9. communication discipline active
  10. exit criteria tracked

17.3 Rule

Restricted posture is an active operating mode, not just a label.


18. Proposer role checklist

18.1 Class

CL_PROPOSER_ROLE

18.2 Minimum items

  1. proposer role explicitly enabled
  2. proposer key reachable
  3. mempool/candidate pool healthy enough
  4. local state current
  5. telemetry for proposal events live
  6. no unresolved validation mismatch
  7. no launch-window local blocker for proposing
  8. proposer log capture active

18.3 Rule

Uncertain local state or bad candidate pool health SHOULD block proposer enablement.


19. Verifier role checklist

19.1 Class

CL_VERIFIER_ROLE

19.2 Minimum items

  1. verifier role explicitly enabled
  2. verifier key reachable
  3. validation path self-check healthy
  4. replay spot-check healthy if required
  5. fraud-proof/evidence logging active
  6. peer/state view current enough
  7. no unresolved deterministic anomaly
  8. verifier telemetry live

19.3 Rule

A verifier that cannot trust its local validation path SHOULD not sign.


20. Notary role checklist

20.1 Class

CL_NOTARY_ROLE

20.2 Minimum items

  1. notary role explicitly enabled
  2. notary key reachable and isolated
  3. reexecution path healthy
  4. finality threshold and committee view correct
  5. notarization evidence logging active
  6. no unresolved validation or consensus anomaly
  7. launch policy allows notary activation now
  8. notary operator confirms readiness explicitly

20.3 Rule

Notary role SHOULD be the strictest checklist of the three signing roles.


21. Observer role checklist

21.1 Class

CL_OBSERVER_ROLE

21.2 Minimum items

  1. observer scope explicit
  2. no unintended signing roles active
  3. finalized view healthy
  4. snapshot/archival paths working
  5. metrics and logs active
  6. chain identity confirmed
  7. release/genesis scope confirmed
  8. anomaly reporting path ready

21.3 Rule

Observer-only nodes SHOULD remain provably non-signing.


22. Incident local response checklist

22.1 Class

CL_INCIDENT_LOCAL_RESPONSE

22.2 Minimum items

  1. anomaly classified
  2. exact time and scope recorded
  3. risky signing role disabled if needed
  4. evidence preserved
  5. logs preserved
  6. state roots preserved
  7. incident escalation sent
  8. local safe mode or halt status explicit
  9. replay/recovery decision made
  10. rejoin blocked until conditions met

22.3 Rule

This checklist SHOULD be started immediately after a real anomaly, not after long discussion.


23. Restart and rejoin checklist

23.1 Class

CL_RESTART_AND_REJOIN

23.2 Minimum items

  1. reason for restart/rejoin recorded
  2. artifacts revalidated
  3. config drift check complete
  4. preflight rerun
  5. last trusted checkpoint identified
  6. replay/spot-check completed if needed
  7. node starts validation-only first
  8. signing roles re-enabled only after health checks
  9. post-restart monitoring heightened
  10. restart evidence preserved

23.3 Rule

Automatic rejoin straight into full signing SHOULD be disallowed unless policy explicitly permits and health checks are green.


24. Post-launch normalization checklist

24.1 Class

CL_POST_LAUNCH_NORMALIZATION

24.2 Minimum items

  1. stable finality over observation window confirmed
  2. no unresolved launch-critical anomaly remains
  3. operator health acceptable
  4. restricted posture exit approved
  5. snapshot baseline updated
  6. normal alert profile prepared
  7. no pending critical artifact supersession
  8. launch records archived
  9. post-launch review scheduled
  10. normalization record emitted

24.3 Rule

Normalization SHOULD be explicit and evidence-based.


25. Minimal pass/fail rules

25.1 Hard fail items

Items involving:

  • binary hash
  • genesis_hash
  • chain_id
  • package id mismatch
  • key-role mismatch
  • preflight blocked SHOULD default to FAIL.

25.2 Hold items

Items involving:

  • temporary monitoring issue
  • operator staffing check
  • transient connectivity uncertainty
  • missing non-critical note MAY default to HOLD depending on launch policy.

25.3 N/A items

Allowed only when role or feature genuinely not active. N/A MUST NOT be used to avoid verification of active scope.


26. Checklist signing and archival

26.1 Critical checklist records SHOULD include:

  • operator signature or attestation if policy requires
  • completed_at timestamp
  • exact scope identifiers
  • evidence refs

26.2 Rule

At minimum, checklist results SHOULD be archived in launch audit package or operator audit trail.


27. Human-usable rendering

27.1 Recommendation

Each checklist template SHOULD exist in two forms:

  1. canonical machine-readable form
  2. concise operator-facing rendering

27.2 Rule

Both forms MUST map to same logical checklist semantics.

27.3 Goal

Make checklists usable under time pressure without losing auditability.


28. Checklist completion rules

28.1 A checklist SHOULD be considered complete only if:

  • all mandatory items have result
  • overall verdict is set
  • required evidence refs included for critical items
  • operator identity and scope present
  • start/end timestamps present

28.2 Rule

Incomplete checklist MUST NOT masquerade as pass.


29. Template anti-patterns

29.1 Systems SHOULD avoid:

  1. huge free-form checklists no one can run live
  2. vague items like “everything looks good”
  3. no difference between hold and fail
  4. no artifact/hash binding
  5. role-specific steps mixed into generic template without scoping
  6. unchecked default pass values
  7. same checklist for observer and notary
  8. no evidence refs for critical items
  9. silent template changes during launch
  10. storing only screenshots instead of structured results

30. Example concise operator rendering pattern

30.1 Good item form

  • Check: recomputed genesis_hash matches expected value
  • Input: genesis package id X
  • Evidence: derived root record / hash output
  • Result: PASS / FAIL / HOLD / N/A

30.2 Bad item form

  • “Genesis seems right”

31. Checklist aggregation

31.1 Need

Launch coordinator may need rolled-up views.

31.2 Aggregation SHOULD support:

  • per operator
  • per role
  • per phase
  • per cluster
  • pass/fail counts
  • open hold items
  • hard fail items

31.3 Rule

Aggregated view MUST remain traceable back to individual checklist records.


32. Relationship to launch ceremony

32.1 Rule

Operator checklist completion SHOULD feed into ceremony confirmations. A readiness confirmation without backing checklist SHOULD be discouraged for serious scope.

32.2 Example

validator_role_ready_confirmed should point to:

  • prelaunch artifact checklist
  • preflight node checklist
  • role-specific checklist
  • launch window final checks checklist

33. Relationship to incident response

33.1 Rule

Incident local response checklist SHOULD bridge operator actions into formal incident workflow.

33.2 Benefit

This reduces forgotten steps during stress:

  • preserve evidence
  • disable risky roles
  • escalate
  • prepare replay/rejoin decision

34. Relationship to mainnet audit archive

34.1 Recommendation

Critical completed checklist records SHOULD be included in:

  • launch audit package
  • operator audit package
  • post-launch stabilization review bundle

34.2 Rule

This is especially important for:

  • prelaunch artifact checks
  • preflight
  • first epochs
  • incident local response
  • restricted posture exit

35. Formal goals

AZ-029 urmărește aceste obiective:

35.1 Human error reduction

Operators have concrete, short and phase-appropriate checklists.

35.2 Scope-bound execution

Every checklist is bound to exact artifacts, role and network scope.

35.3 Auditability

Checklist completion can be archived and verified later.

35.4 Safer launch and recovery

Critical operational moments are less dependent on memory and improvisation.


36. Formula documentului

Concrete Operator Checklists = phase-specific templates + role-specific templates + explicit pass/fail/hold items + evidence refs + scope-bound checklist records


37. Relația cu restul suitei

  • AZ-025 definește manualul larg pentru operatori.
  • AZ-028 definește launch window procedure.
  • AZ-029 definește listele concrete prin care aceste proceduri devin executabile sub presiune.

Pe scurt: AZ-025 spune ce trebuie făcut; AZ-029 îl transformă în checklist operabil.


38. Ce urmează

După AZ-029, documentul corect este:

AZ-030 — Launch Decision Ledger

Acolo trebuie fixate:

  • registrul formal al deciziilor de launch,
  • legătura dintre go/no-go, hold, abort și evidență,
  • cine a decis ce,
  • pe baza căror artefacte și checklist-uri,
  • și cum se păstrează această istorie ca audit trail constituțional și operațional.

Închidere

Manualele bune explică. Checklist-urile bune execută.

În momentele critice nu câștigă cel care are cele mai multe pagini de explicații, ci cel care are lista corectă, în ordinea corectă, cu verdict clar și dovadă suficientă pentru fiecare pas.

Acolo începe disciplina operațională reală.