AZ-029 — Concrete Operator Checklists v1
Status
Acest document transformă manualul operațional și procedura de launch window în checklist-uri concrete, executabile și auditabile pentru operatorii ATLAS ZERO.
AZ-025 a definit manualul validatorilor și operatorilor. AZ-028 a definit procedura din fereastra reală de lansare. AZ-029 răspunde la întrebarea: ce listă exactă, scurtă și operațională trebuie să urmeze operatorul în fiecare fază critică pentru a reduce eroarea umană și a păstra auditabilitatea?
Scopul documentului este să fixeze:
- checklist-uri separate pe faze;
- checklist-uri pe roluri;
- criterii de pass/fail/hold;
- câmpuri minime de semnare și timestamp;
- reguli de păstrare a evidenței;
- relația dintre checklist și artefactele canonice.
Acest document se bazează pe:
- AZ-002 până la AZ-028, cu accent direct pe AZ-015, AZ-017, AZ-022, AZ-025, AZ-026 și AZ-028.
Termeni:
- MUST = obligatoriu
- MUST NOT = interzis
- SHOULD = recomandat puternic
- MAY = opțional
1. Obiectiv
AZ-029 răspunde la 10 întrebări practice:
- Ce checklist-uri minime are nevoie un operator?
- Ce se verifică înainte de launch?
- Ce se verifică exact în launch window?
- Ce se verifică după bootstrap și în primele epoci?
- Ce checklist separat au proposer, verifier și notary?
- Cum se marchează pass, fail, hold și escalation?
- Cum se semnează sau se arhivează checklist-urile?
- Cum se evită bifarea formală fără verificare reală?
- Cum se leagă checklist-ul de artefacte, hash-uri și roluri?
- Cum se folosesc checklist-urile în incident, restart și rejoin?
2. Principii
2.1 Checklists are execution tools, not decorative docs
Checklist-urile MUST fi suficient de scurte și clare încât să poată fi urmate în timp real.
2.2 Every line must map to a real verification
Un item de checklist MUST corespunde unei verificări reale, nu unei impresii.
2.3 Pass/fail must be explicit
Fiecare item SHOULD avea rezultat explicit:
- pass
- fail
- hold
- n/a
2.4 Checklist truth must bind to exact scope
Checklist-urile critice MUST ancora:
- release package id
- genesis package id
- genesis_hash
- chain_id
- role scope
- timestamp
- operator identity
2.5 Failed checklist item is a signal, not embarrassment
Un fail trebuie tratat ca semnal operațional util. Ascunderea lui este un defect mai grav decât fail-ul însuși.
3. Checklist families
3.1 Core families
ATLAS ZERO SHOULD suporta cel puțin aceste familii:
CL_PRELAUNCH_ARTIFACTSCL_PRELAUNCH_ENVIRONMENTCL_PRELAUNCH_KEYS_AND_ROLESCL_PREFLIGHT_NODECL_LAUNCH_WINDOW_FINAL_CHECKSCL_BOOTSTRAP_STARTCL_FIRST_BLOCKSCL_FIRST_EPOCHSCL_RESTRICTED_POSTURECL_INCIDENT_LOCAL_RESPONSECL_RESTART_AND_REJOINCL_POST_LAUNCH_NORMALIZATION
3.2 Role-specific families
Additionally SHOULD exist:
CL_PROPOSER_ROLECL_VERIFIER_ROLECL_NOTARY_ROLECL_OBSERVER_ROLE
4. Checklist object model
4.1 Canonical structure
OperatorChecklistRecord {
version_major
version_minor
checklist_record_id
checklist_class
checklist_template_version
operator_identity_ref
node_identity_ref?
role_scope
target_network_class
target_chain_id
target_genesis_hash
target_release_package_id
target_genesis_package_id
started_at_unix_ms
completed_at_unix_ms?
overall_verdict
item_root
notes_hash?
}
4.2 overall_verdict
PASSPASS_WITH_NOTESHOLDFAILABORTEDINCOMPLETE
4.3 Rule
Checklist-urile critice SHOULD fi tratate ca obiecte auditabile, nu doar texte bifate local.
5. Checklist item model
5.1 Canonical structure
ChecklistItemResult {
item_id
item_class
prompt_hash
result
evidence_refs?
timestamp_unix_ms
operator_note_hash?
}
5.2 result
PASSFAILHOLDN_A
5.3 Rule
PASS SHOULD mean actual verification completed.
It MUST NOT mean “presumed OK”.
6. Evidence reference policy
6.1 Every critical item SHOULD link to evidence where possible:
- artifact ids
- hashes
- preflight result id
- live status record id
- log refs
- state root refs
- checkpoint record ids
6.2 Rule
For TIER_3/TIER_4 operational checkpoints, evidence-less pass SHOULD be discouraged or forbidden by policy.
7. Template versioning
7.1 Need
Checklist-urile evoluează.
7.2 Rule
Fiecare checklist MUST reference:
- checklist template class
- template version
7.3 Rule
Template changes near launch SHOULD be tightly controlled and reviewed.
8. Minimal operator identification
8.1 Checklist records SHOULD include:
- operator identity ref
- node identity ref if applicable
- role scope
- environment or cluster label if applicable
8.2 Rule
Un checklist critic fără operator și rol identificabil are valoare audit scăzută.
9. Prelaunch artifact checklist
9.1 Class
CL_PRELAUNCH_ARTIFACTS
9.2 Purpose
Verify exact launch artifacts before any node role activation.
9.3 Minimum items
- release package manifest loaded
- release package id matches expected scope
- local binary hash matches approved artifact
- genesis package manifest loaded
- genesis package id matches expected scope
- recomputed genesis_hash matches expected value
- recomputed chain_id matches expected value
- derived roots commitment verified
- attestation sufficiency verified
- release/genesis compatibility verified
9.4 Hold/fail guidance
Any mismatch on binary hash, genesis_hash, chain_id or package id => FAIL.
10. Prelaunch environment checklist
10.1 Class
CL_PRELAUNCH_ENVIRONMENT
10.2 Minimum items
- host identity confirmed
- storage paths writable
- sufficient free disk space
- logging sink reachable
- metrics sink reachable
- snapshot path available
- local clock within tolerated drift
- network/firewall configuration loaded
- resource limits configured
- no undeclared config overrides present
10.3 Rule
Clock drift or broken persistent storage SHOULD block consensus-role launch.
11. Prelaunch keys and roles checklist
11.1 Class
CL_PRELAUNCH_KEYS_AND_ROLES
11.2 Minimum items
- validator identity key/path verified
- proposer key mapping verified if proposer active
- verifier key mapping verified if verifier active
- notary key mapping verified if notary active
- signer process reachable
- wrong-network key mix-up ruled out
- role enablement list explicit
- disabled roles explicit
- emergency local stop path verified
- key rotation/recovery reference available
11.3 Rule
Incorrect role-key mapping => FAIL, not HOLD.
12. Preflight node checklist
12.1 Class
CL_PREFLIGHT_NODE
12.2 Minimum items
- preflight executed
- preflight verdict captured
- artifact verification stage passed
- environment verification stage passed
- config semantic validation passed
- local state store healthy
- telemetry initialized
- role activation policy loaded
- peer bootstrap config loaded
- preflight warnings reviewed
12.3 Rule
Consensus roles SHOULD NOT activate unless preflight verdict is acceptable under launch policy.
13. Launch window final checks checklist
13.1 Class
CL_LAUNCH_WINDOW_FINAL_CHECKS
13.2 Minimum items
- freeze confirmed for release artifacts
- freeze confirmed for genesis artifacts
- no active revocation on critical artifacts
- latest advisory review complete
- operator readiness still current
- monitoring live at launch sensitivity
- incident path staffed and reachable
- no new critical blocker opened
- launch window scope reconfirmed
- hold/abort decision path ready
13.3 Rule
Any newly opened critical blocker SHOULD force HOLD or FAIL.
14. Bootstrap start checklist
14.1 Class
CL_BOOTSTRAP_START
14.2 Minimum items
- launch authorization record seen
- bootstrap instruction seen
- exact launch scope matches local node scope
- validation-only startup path selected first
- network connections established
- peer compatibility checks pass
- local genesis anchor matches live network view
- role activation order understood
- logs and metrics recording bootstrap
- evidence preservation active
14.3 Rule
Without launch authorization or bootstrap instruction, bootstrap start SHOULD NOT proceed.
15. First blocks checklist
15.1 Class
CL_FIRST_BLOCKS
15.2 Minimum items
- first peer compatibility looks healthy
- first candidate/block acceptance patterns normal enough
- no immediate invalid object spike
- no immediate chain_id/genesis mismatch seen
- proposer behavior normal if proposer active
- verifier behavior normal if verifier active
- notary signing not enabled prematurely
- logs captured for first block window
- anomaly thresholds armed
- incident escalation path ready if triggered
15.3 Rule
Unexpected early anomaly in first blocks SHOULD be recorded, not only observed informally.
16. First epochs checklist
16.1 Class
CL_FIRST_EPOCHS
16.2 Minimum items
- first finalized epoch observed
- finalized root recorded
- finality cadence within acceptable band
- validator participation acceptable
- no unexplained deterministic mismatch reported locally
- no no-finality escalation threshold crossed
- no BVM critical anomaly seen in launch subset
- no witness/proof critical anomaly seen in launch subset
- no governance activation anomaly seen
- early observation record emitted or logged
16.3 Rule
Failure to observe healthy first finalized epochs SHOULD delay normalization and may trigger incident workflow.
17. Restricted posture checklist
17.1 Class
CL_RESTRICTED_POSTURE
17.2 Minimum items
- restricted posture entry acknowledged
- config freeze maintained
- binary freeze maintained
- increased snapshot cadence enabled
- higher-sensitivity alerts enabled
- restart approvals tightened
- optional deferred features still disabled
- anomaly review cadence active
- communication discipline active
- exit criteria tracked
17.3 Rule
Restricted posture is an active operating mode, not just a label.
18. Proposer role checklist
18.1 Class
CL_PROPOSER_ROLE
18.2 Minimum items
- proposer role explicitly enabled
- proposer key reachable
- mempool/candidate pool healthy enough
- local state current
- telemetry for proposal events live
- no unresolved validation mismatch
- no launch-window local blocker for proposing
- proposer log capture active
18.3 Rule
Uncertain local state or bad candidate pool health SHOULD block proposer enablement.
19. Verifier role checklist
19.1 Class
CL_VERIFIER_ROLE
19.2 Minimum items
- verifier role explicitly enabled
- verifier key reachable
- validation path self-check healthy
- replay spot-check healthy if required
- fraud-proof/evidence logging active
- peer/state view current enough
- no unresolved deterministic anomaly
- verifier telemetry live
19.3 Rule
A verifier that cannot trust its local validation path SHOULD not sign.
20. Notary role checklist
20.1 Class
CL_NOTARY_ROLE
20.2 Minimum items
- notary role explicitly enabled
- notary key reachable and isolated
- reexecution path healthy
- finality threshold and committee view correct
- notarization evidence logging active
- no unresolved validation or consensus anomaly
- launch policy allows notary activation now
- notary operator confirms readiness explicitly
20.3 Rule
Notary role SHOULD be the strictest checklist of the three signing roles.
21. Observer role checklist
21.1 Class
CL_OBSERVER_ROLE
21.2 Minimum items
- observer scope explicit
- no unintended signing roles active
- finalized view healthy
- snapshot/archival paths working
- metrics and logs active
- chain identity confirmed
- release/genesis scope confirmed
- anomaly reporting path ready
21.3 Rule
Observer-only nodes SHOULD remain provably non-signing.
22. Incident local response checklist
22.1 Class
CL_INCIDENT_LOCAL_RESPONSE
22.2 Minimum items
- anomaly classified
- exact time and scope recorded
- risky signing role disabled if needed
- evidence preserved
- logs preserved
- state roots preserved
- incident escalation sent
- local safe mode or halt status explicit
- replay/recovery decision made
- rejoin blocked until conditions met
22.3 Rule
This checklist SHOULD be started immediately after a real anomaly, not after long discussion.
23. Restart and rejoin checklist
23.1 Class
CL_RESTART_AND_REJOIN
23.2 Minimum items
- reason for restart/rejoin recorded
- artifacts revalidated
- config drift check complete
- preflight rerun
- last trusted checkpoint identified
- replay/spot-check completed if needed
- node starts validation-only first
- signing roles re-enabled only after health checks
- post-restart monitoring heightened
- restart evidence preserved
23.3 Rule
Automatic rejoin straight into full signing SHOULD be disallowed unless policy explicitly permits and health checks are green.
24. Post-launch normalization checklist
24.1 Class
CL_POST_LAUNCH_NORMALIZATION
24.2 Minimum items
- stable finality over observation window confirmed
- no unresolved launch-critical anomaly remains
- operator health acceptable
- restricted posture exit approved
- snapshot baseline updated
- normal alert profile prepared
- no pending critical artifact supersession
- launch records archived
- post-launch review scheduled
- normalization record emitted
24.3 Rule
Normalization SHOULD be explicit and evidence-based.
25. Minimal pass/fail rules
25.1 Hard fail items
Items involving:
- binary hash
- genesis_hash
- chain_id
- package id mismatch
- key-role mismatch
- preflight blocked
SHOULD default to
FAIL.
25.2 Hold items
Items involving:
- temporary monitoring issue
- operator staffing check
- transient connectivity uncertainty
- missing non-critical note
MAY default to
HOLDdepending on launch policy.
25.3 N/A items
Allowed only when role or feature genuinely not active.
N/A MUST NOT be used to avoid verification of active scope.
26. Checklist signing and archival
26.1 Critical checklist records SHOULD include:
- operator signature or attestation if policy requires
- completed_at timestamp
- exact scope identifiers
- evidence refs
26.2 Rule
At minimum, checklist results SHOULD be archived in launch audit package or operator audit trail.
27. Human-usable rendering
27.1 Recommendation
Each checklist template SHOULD exist in two forms:
- canonical machine-readable form
- concise operator-facing rendering
27.2 Rule
Both forms MUST map to same logical checklist semantics.
27.3 Goal
Make checklists usable under time pressure without losing auditability.
28. Checklist completion rules
28.1 A checklist SHOULD be considered complete only if:
- all mandatory items have result
- overall verdict is set
- required evidence refs included for critical items
- operator identity and scope present
- start/end timestamps present
28.2 Rule
Incomplete checklist MUST NOT masquerade as pass.
29. Template anti-patterns
29.1 Systems SHOULD avoid:
- huge free-form checklists no one can run live
- vague items like “everything looks good”
- no difference between hold and fail
- no artifact/hash binding
- role-specific steps mixed into generic template without scoping
- unchecked default pass values
- same checklist for observer and notary
- no evidence refs for critical items
- silent template changes during launch
- storing only screenshots instead of structured results
30. Example concise operator rendering pattern
30.1 Good item form
- Check: recomputed genesis_hash matches expected value
- Input: genesis package id X
- Evidence: derived root record / hash output
- Result: PASS / FAIL / HOLD / N/A
30.2 Bad item form
- “Genesis seems right”
31. Checklist aggregation
31.1 Need
Launch coordinator may need rolled-up views.
31.2 Aggregation SHOULD support:
- per operator
- per role
- per phase
- per cluster
- pass/fail counts
- open hold items
- hard fail items
31.3 Rule
Aggregated view MUST remain traceable back to individual checklist records.
32. Relationship to launch ceremony
32.1 Rule
Operator checklist completion SHOULD feed into ceremony confirmations. A readiness confirmation without backing checklist SHOULD be discouraged for serious scope.
32.2 Example
validator_role_ready_confirmed should point to:
- prelaunch artifact checklist
- preflight node checklist
- role-specific checklist
- launch window final checks checklist
33. Relationship to incident response
33.1 Rule
Incident local response checklist SHOULD bridge operator actions into formal incident workflow.
33.2 Benefit
This reduces forgotten steps during stress:
- preserve evidence
- disable risky roles
- escalate
- prepare replay/rejoin decision
34. Relationship to mainnet audit archive
34.1 Recommendation
Critical completed checklist records SHOULD be included in:
- launch audit package
- operator audit package
- post-launch stabilization review bundle
34.2 Rule
This is especially important for:
- prelaunch artifact checks
- preflight
- first epochs
- incident local response
- restricted posture exit
35. Formal goals
AZ-029 urmărește aceste obiective:
35.1 Human error reduction
Operators have concrete, short and phase-appropriate checklists.
35.2 Scope-bound execution
Every checklist is bound to exact artifacts, role and network scope.
35.3 Auditability
Checklist completion can be archived and verified later.
35.4 Safer launch and recovery
Critical operational moments are less dependent on memory and improvisation.
36. Formula documentului
Concrete Operator Checklists = phase-specific templates + role-specific templates + explicit pass/fail/hold items + evidence refs + scope-bound checklist records
37. Relația cu restul suitei
- AZ-025 definește manualul larg pentru operatori.
- AZ-028 definește launch window procedure.
- AZ-029 definește listele concrete prin care aceste proceduri devin executabile sub presiune.
Pe scurt: AZ-025 spune ce trebuie făcut; AZ-029 îl transformă în checklist operabil.
38. Ce urmează
După AZ-029, documentul corect este:
AZ-030 — Launch Decision Ledger
Acolo trebuie fixate:
- registrul formal al deciziilor de launch,
- legătura dintre go/no-go, hold, abort și evidență,
- cine a decis ce,
- pe baza căror artefacte și checklist-uri,
- și cum se păstrează această istorie ca audit trail constituțional și operațional.
Închidere
Manualele bune explică. Checklist-urile bune execută.
În momentele critice nu câștigă cel care are cele mai multe pagini de explicații, ci cel care are lista corectă, în ordinea corectă, cu verdict clar și dovadă suficientă pentru fiecare pas.
Acolo începe disciplina operațională reală.