ATLAS ZERO VM.zip / AZ-025_Validator_and_Operator_Launch_Manual_v1.md

AZ-025 — Validator and Operator Launch Manual v1

AZ-025 — Validator and Operator Launch Manual v1

Status

Acest document definește manualul operațional de lansare pentru validatori și operatori în ecosistemul ATLAS ZERO.

După AZ-001 până la AZ-024, există deja:

  • specificația protocolului;
  • regulile de validare, consens, BVM, witness, economie și guvernanță;
  • modelul de securitate și runbook-urile de incident;
  • pachetul genesis concret;
  • vault-ul de artefacte și pipeline-ul de release;
  • corpusul concret de conformitate;
  • milestone-urile de implementare.

AZ-025 răspunde la întrebarea: ce trebuie să facă exact un validator sau operator pentru a porni corect, sigur și verificabil un nod înainte, în timpul și imediat după lansare?

Scopul documentului este să fixeze:

  • pașii exacti de pre-launch;
  • verificarea release package și genesis package;
  • pregătirea cheilor și a mediului;
  • verificările de integritate înainte de boot;
  • activarea rolurilor de consens;
  • monitorizarea primelor epoci;
  • comportamentul în safe mode, incident și recovery la început de rețea.

Acest document se bazează pe:

  • AZ-002 până la AZ-024, cu accent direct pe AZ-015, AZ-016, AZ-017, AZ-021 și AZ-022.

Termeni:

  • MUST = obligatoriu
  • MUST NOT = interzis
  • SHOULD = recomandat puternic
  • MAY = opțional

1. Obiectiv

AZ-025 răspunde la 10 întrebări operaționale:

  1. Ce verifică operatorul înainte să pornească nodul?
  2. Cum validează release package și genesis package?
  3. Cum pregătește cheile, configurația și mediul de rulare?
  4. Cum pornește nodul fără să introducă ambiguități locale?
  5. Cum activează rolurile de proposer/verifier/notary?
  6. Ce verificări face în launch window și în primele epoci?
  7. Cum răspunde la mismatch-uri, lipsă de finalitate sau pachete invalide?
  8. Cum intră în safe mode sau halt local fără a falsifica realitatea protocolară?
  9. Cum execută bootstrap, restart și rejoin în mod sigur?
  10. Ce dovezi și jurnale trebuie să păstreze pentru audit?

2. Principii

2.1 Verify before boot

Un nod MUST verifica:

  • pachetul de release,
  • pachetul genesis,
  • binarul,
  • configurația,
  • cheile, înainte de a porni în rol validator.

2.2 No local reinterpretation

Operatorul MUST NOT „ghici”:

  • ce genesis este corect,
  • ce release este corect,
  • ce parametri „probabil” trebuie folosiți. Totul trebuie verificat din artefactele canonice.

2.3 Separate local health from protocol truth

Problemele locale ale nodului MUST fi tratate ca:

  • degradare locală,
  • halt local de servicii,
  • retragere din rol, nu ca modificări ale adevărului protocolar.

2.4 Start conservative

În launch window și primele epoci, operatorul SHOULD favoriza:

  • siguranță,
  • verificare,
  • observabilitate, înainte de disponibilitate agresivă sau tuning de performanță.

2.5 Preserve evidence

Orice anomalie semnificativă MUST fi jurnalizată și păstrată pentru:

  • incident response,
  • audit,
  • replay,
  • eventual fraud proof.

3. Role classes covered by this manual

3.1 Covered roles

Manualul acoperă în principal:

  • full validation node operator
  • validator operator
  • proposer operator
  • verifier operator
  • notary operator
  • archival/observer operator with launch duties

3.2 Additional operators

Unele secțiuni pot ajuta și:

  • genesis custodian
  • release manager
  • validator bootstrap coordinator
  • recovery operator

3.3 Rule

Orice operator MUST ști exact ce roluri are activate. Nu se pornesc roluri de consens „implicit”.


4. Launch phases

4.1 Operational phases

Operatorii SHOULD trata lansarea în faze:

  1. Artifact Intake
  2. Local Verification
  3. Environment Preparation
  4. Node Preflight
  5. Bootstrap Start
  6. Role Activation
  7. Launch Window Monitoring
  8. Early Epoch Stabilization
  9. Restricted Post-Launch Operation
  10. Normalization

4.2 Rule

Trecerea dintre faze SHOULD fi explicită și verificabilă.


5. Required artifacts before launch

5.1 Every validator/operator SHOULD possess:

  • exact release package
  • exact genesis package
  • exact release manifest
  • exact genesis package manifest
  • exact conformance corpus reference or launch-critical conformance evidence
  • exact operator guide bundle if distributed separately
  • signed/checksummed binary artifacts
  • launch window instructions
  • incident escalation contacts or hashes/references if policy uses them

5.2 Role-specific extras

A validator operator SHOULD also have:

  • validator identity ref
  • proposer/verifier/notary key refs or secure access to them
  • local role configuration
  • network boot peers or discovery config
  • monitoring and log sink config

5.3 Rule

Missing mandatory launch artifacts MUST be treated as blocker before node role activation.


6. Artifact verification checklist

6.1 Before boot, operator MUST verify:

  1. release package manifest
  2. release artifact hashes
  3. release approvals/attestations
  4. genesis package manifest
  5. genesis package hashes
  6. genesis package attestations
  7. exact genesis_hash
  8. exact chain_id
  9. compatibility between release package and genesis package
  10. local binary hash matches approved release artifact

6.2 Rule

No validator SHOULD join using a package set not explicitly matched and validated.


7. Release package verification steps

7.1 Minimum checks

Operator SHOULD:

  1. verify published release manifest authenticity
  2. verify required artifacts present
  3. verify binary content_hash and canonical identity
  4. verify release candidate / final release approvals
  5. verify no revocation on release artifact set
  6. verify scope lock to intended target network

7.2 Rule

If release package validation fails, node MUST NOT start in consensus role.


8. Genesis package verification steps

8.1 Minimum checks

Operator MUST:

  1. load genesis package manifest
  2. verify package manifest integrity
  3. verify all required artifacts present
  4. recompute artifact hashes
  5. validate genesis_spec.blob
  6. recompute genesis_hash
  7. recompute chain_id
  8. recompute derived roots
  9. verify validator set bundle
  10. verify parameter state bundle
  11. verify registry and policy bundles
  12. verify attestation sufficiency

8.2 Rule

Any mismatch in genesis_hash, chain_id, derived roots or validator bundle MUST be treated as hard stop.


9. Compatibility check between release and genesis

9.1 Operator MUST verify:

  • release package target network class == genesis package target network class
  • release package chain_id compatibility == genesis package chain_id
  • release binary protocol version supports genesis parameter state
  • mainnet/public-testnet scope matches intended launch scope
  • no superseded/revoked artifact still used

9.2 Rule

A node binary validated for one genesis scope MUST NOT be assumed valid for another launch scope without explicit linkage.


10. Local environment preparation

10.1 Operator SHOULD prepare:

  • isolated host or controlled environment
  • pinned configuration
  • correct clock synchronization
  • storage paths
  • snapshot/replay capacity
  • logging sink
  • metrics sink
  • alert channels
  • network/firewall rules
  • secure key access path

10.2 Rule

Consensus-role nodes SHOULD run in environment with minimal undeclared mutable state.


11. Key preparation

11.1 Required key classes as applicable

  • validator identity key
  • proposer signing key
  • verifier signing key
  • notary signing key
  • admin key for local service controls if used
  • emergency local stop controls if used operationally

11.2 Rule

Role keys SHOULD be separated. Same hot key for all roles SHOULD be avoided, especially for production launch.

11.3 Pre-launch checks

Operator MUST verify:

  • correct key loaded for correct role
  • no wrong-network key mix-up
  • key material accessibility path works
  • signer process integrity
  • backup/recovery or rotation plan exists

12. Time and environment sanity

12.1 Operator MUST verify:

  • local clock within tolerated drift
  • timezone assumptions not affecting protocol configuration
  • host identity and networking consistent
  • adequate disk space
  • write permissions to required paths
  • snapshot path available
  • log sink writable

12.2 Rule

A node with broken time sync or unstable storage SHOULD NOT enter consensus role.


13. Local configuration policy

13.1 Local config SHOULD include only:

  • node role settings
  • network endpoints / peers
  • storage paths
  • telemetry endpoints
  • safe mode / local halt controls
  • key access references
  • resource limits

13.2 Local config MUST NOT redefine:

  • genesis truth
  • protocol parameters
  • chain identity
  • release identity
  • activation boundaries

13.3 Rule

If local config appears to override protocol truth, launch must stop and operator configuration be reviewed.


14. Node preflight

14.1 Before full boot, operator SHOULD run preflight mode that checks:

  • binary hash
  • release package linkage
  • genesis package linkage
  • storage readiness
  • key presence
  • network configuration sanity
  • telemetry path
  • snapshot restore ability if relevant
  • local config syntax/semantic validity
  • role enablement policy

14.2 Rule

Preflight failures MUST block consensus-role startup.


15. Preflight verdicts

15.1 Recommended verdicts

  • PREFLIGHT_OK
  • PREFLIGHT_OK_WITH_WARNINGS
  • PREFLIGHT_BLOCKED
  • PREFLIGHT_SCOPE_MISMATCH
  • PREFLIGHT_KEY_FAILURE
  • PREFLIGHT_ARTIFACT_FAILURE
  • PREFLIGHT_ENV_FAILURE

15.2 Rule

Consensus roles MAY start only with PREFLIGHT_OK or narrowly defined OK_WITH_WARNINGS classes approved by launch policy.


16. Bootstrap start sequence

16.1 Recommended sequence

  1. verify artifacts
  2. verify config
  3. run preflight
  4. initialize local data stores
  5. load genesis package
  6. derive genesis state and roots
  7. compare derived values with expected values
  8. open network connections
  9. sync or confirm initial protocol view
  10. enter validation-only mode first
  11. activate consensus role only after local and network sanity checks

16.2 Rule

Nodes SHOULD avoid jumping directly into proposer/notary role before initial validation-only bootstrap.


17. Validation-only bootstrap

17.1 Purpose

Allows node to confirm:

  • it understands the network correctly,
  • it sees expected genesis,
  • it derives same protocol state, without yet affecting network by proposing or notarizing.

17.2 Recommended checks in this phase

  • peer compatibility
  • chain_id match
  • genesis_hash match
  • initial parameter state match
  • validator role eligibility match
  • no local replay mismatch

17.3 Rule

If any of these checks fail, operator MUST remain out of consensus role.


18. Role activation policy

18.1 Role activation SHOULD be explicit per role:

  • validation active
  • proposer active
  • verifier active
  • notary active

18.2 Rule

Notary role SHOULD be activated last among consensus roles unless launch process explicitly requires simultaneous activation and tooling guarantees readiness.

18.3 Additional caution

If node is healthy enough to validate but not fully healthy enough to sign, operator SHOULD keep signing roles disabled.


19. Proposer activation checklist

19.1 Before enabling proposer role, operator SHOULD verify:

  • mempool/candidate pool healthy
  • local state current
  • peer connectivity acceptable
  • no unresolved preflight warnings in consensus-critical scope
  • proposer key reachable and correct
  • telemetry for block production active

19.2 Rule

A node with uncertain local state SHOULD NOT propose.


20. Verifier activation checklist

20.1 Before enabling verifier role, operator SHOULD verify:

  • candidate validation path passes self-checks
  • vote signing path correct
  • consensus state current
  • fraud proof logging enabled
  • replay path available for anomaly investigation

20.2 Rule

Verifier role SHOULD be disabled if node cannot deterministically reproduce validation path under launch conditions.


21. Notary activation checklist

21.1 Before enabling notary role, operator MUST verify:

  • reexecution path healthy
  • finality threshold and committee view correct
  • notary key correct and isolated
  • notarization logs and evidence capture active
  • no unresolved validation divergence
  • no suspicious launch anomaly active

21.2 Rule

Notary role is highest-risk among core launch roles and SHOULD be activated only after strongest confidence checks.


22. Peer and network checks

22.1 Operator SHOULD verify:

  • peers report expected chain identity
  • enough peer diversity
  • no obvious partition
  • acceptable latency
  • expected launch peers reachable
  • peer software versions acceptable per launch policy

22.2 Rule

A node connected mostly to mismatched or suspicious peers SHOULD not activate consensus roles.


23. Launch window monitoring

23.1 In launch window, operator SHOULD monitor:

  • finalized epoch cadence
  • block proposal acceptance/rejection patterns
  • verifier/notary participation metrics
  • invalid object rates
  • BVM failure rates
  • witness/proof anomalies
  • governance activation anomalies
  • local resource saturation
  • key/signing path health

23.2 Rule

Launch window monitoring MUST be higher-sensitivity than steady-state operation.


24. Early epoch checks

24.1 During first epochs, operator SHOULD confirm:

  • expected genesis anchored
  • first finalized roots consistent
  • validator participation expected
  • no unexplained no-finality
  • no deterministic replay mismatch
  • no artifact scope mismatch discovered post-start

24.2 Rule

If early epoch truth is uncertain, operator SHOULD step down to validation-only or local safe mode rather than continue signing blindly.


25. Launch anomaly classes

25.1 Recommended classes

  • artifact mismatch
  • genesis mismatch
  • validator set mismatch
  • parameter state mismatch
  • consensus participation anomaly
  • no-finality anomaly
  • BVM divergence anomaly
  • witness/proof anomaly
  • governance anomaly
  • local environment anomaly
  • key/signer anomaly

25.2 Rule

Every anomaly class SHOULD map to an operator action profile.


26. Operator action profiles

26.1 Standard profiles

  • OP_OBSERVE
  • OP_VALIDATION_ONLY
  • OP_DISABLE_PROPOSER
  • OP_DISABLE_SIGNING_ALL
  • OP_LOCAL_SAFE_MODE
  • OP_LOCAL_SERVICE_HALT
  • OP_ESCALATE_INCIDENT
  • OP_RECOVERY_REPLAY

26.2 Rule

Operators SHOULD choose the least dangerous profile that preserves protocol truth and local evidence.


27. Local safe mode

27.1 Local safe mode MAY include:

  • disable proposer
  • disable verifier/notary signing
  • keep network and validation alive
  • keep metrics and logs alive
  • freeze admin changes
  • preserve snapshots
  • increase alert sensitivity

27.2 Rule

Local safe mode MUST be clearly local. It does not alter network protocol rules.


28. Local service halt

28.1 Purpose

Stop dangerous or broken local components.

28.2 May include stopping:

  • proposer service
  • notary service
  • RPC write endpoints
  • local agent integrations
  • indexer or explorer adjunct

28.3 Rule

Local service halt SHOULD be used if continuing to sign or submit is riskier than going temporarily dark.


29. Incident escalation

29.1 Operator MUST escalate when:

  • genesis mismatch discovered
  • binary/release mismatch discovered
  • deterministic divergence suspected
  • no-finality persists beyond threshold
  • conflicting notarization seen
  • key compromise suspected
  • impossible governance activation seen
  • BVM consensus-critical anomaly seen

29.2 Escalation package SHOULD include:

  • node identity
  • role status
  • exact artifact ids and hashes
  • relevant logs
  • state roots
  • observed anomaly class
  • timestamps
  • any preserved evidence refs

30. Replay and rebuild actions

30.1 If operator suspects local corruption or divergence, SHOULD:

  1. disable signing roles
  2. preserve current logs and snapshots
  3. identify last trusted finalized checkpoint
  4. replay from trusted checkpoint
  5. compare derived roots and receipts
  6. re-evaluate whether node can rejoin safely

30.2 Rule

A node MUST NOT resume signing after replay mismatch without explicit incident handling and resolution.


31. Restart procedure

31.1 Safe restart sequence

  1. preserve state and logs
  2. verify artifacts unchanged
  3. verify no local config drift
  4. run preflight again
  5. restore from last good checkpoint if needed
  6. start validation-only
  7. re-enable signing roles gradually

31.2 Rule

Crash/restart MUST NOT imply immediate automatic re-entry into all signing roles unless launch policy explicitly permits and health checks pass.


32. Rejoin procedure after downtime

32.1 Operator SHOULD:

  • verify current release/genesis scope still same
  • verify local binary still valid for active network scope
  • sync state and compare finalized checkpoints
  • run replay spot-checks if downtime significant or incident occurred
  • enter validation-only first
  • re-enable roles only after healthy sync

32.2 Rule

Rejoin after suspicious downtime SHOULD be conservative.


33. Snapshot policy for operators

33.1 Before launch, operator SHOULD ensure:

  • initial empty/pre-genesis local snapshot policy defined
  • post-genesis snapshot available or derivable
  • periodic finalized checkpoint snapshots enabled
  • pre-restart and pre-recovery snapshots possible

33.2 Rule

Snapshots used operationally MUST be tied to canonical roots and trusted package scope.


34. Logging and audit requirements

34.1 Launch-time logs SHOULD capture:

  • binary identity
  • release package id
  • genesis package id
  • genesis_hash
  • chain_id
  • role enablement events
  • preflight verdict
  • first peer compatibility checks
  • first finalized epoch observations
  • anomalies and local safe mode transitions

34.2 Rule

If a validator cannot later prove which exact artifacts it launched with, launch audit quality is insufficient.


35. Communication discipline

35.1 Operator communications SHOULD distinguish:

  • local node issue
  • release artifact issue
  • genesis package issue
  • network-wide consensus issue
  • observability-only issue

35.2 Rule

Do not label local misconfiguration as protocol fault until evidence supports it.


36. Genesis ceremony and launch ceremony integration

36.1 Operators SHOULD treat:

  • package verification
  • checksum/root confirmation
  • role readiness confirmation as explicit ceremony steps, not informal chat confirmations.

36.2 Recommended confirmations

  • “verified genesis_hash”
  • “verified chain_id”
  • “verified binary hash”
  • “validator role ready”
  • “notary role ready”
  • “monitoring live”
  • “incident path staffed”

36.3 Rule

Ceremony statements SHOULD map to actual checks, not ritual words.


37. Launch blockers for individual operators

37.1 An operator MUST NOT activate consensus role if:

  • release artifact mismatch
  • genesis package mismatch
  • key mapping incorrect
  • preflight blocked
  • state store unhealthy
  • clock drift severe
  • signer unavailable or misconfigured
  • validator set eligibility unclear
  • telemetry/incident path absent for critical roles

37.2 Rule

Individual no-go is preferable to unsafe participation.


38. Post-launch restricted posture

38.1 For early epochs/days, operators SHOULD:

  • avoid unnecessary config changes
  • avoid unnecessary binary changes
  • keep signing roles conservative
  • increase snapshot frequency
  • monitor more aggressively
  • require stricter internal approval for local modifications

38.2 Rule

Early launch is stabilization period, not optimization period.


39. Change control during launch window

39.1 Operators SHOULD NOT during launch window:

  • swap binaries casually
  • modify local role mappings ad hoc
  • change trusted package source
  • alter genesis files
  • patch configs that affect consensus semantics
  • rotate keys without recorded reason and process

39.2 Allowed emergency changes

Only those required by incident response and already covered by runbook/process.


40. Minimal operator checklist summary

40.1 Before launch

  • verify release package
  • verify genesis package
  • verify binary hash
  • verify chain_id and genesis_hash
  • verify keys and roles
  • verify preflight
  • verify monitoring
  • verify incident path

40.2 At bootstrap

  • start validation-only
  • verify peers and state
  • enable roles in controlled order
  • watch first finalized epochs

40.3 If anomaly

  • preserve evidence
  • disable risky roles
  • escalate
  • replay/recover if needed

41. Anti-patterns

Operators SHOULD avoid:

  1. starting from unverified downloaded binaries
  2. hand-editing genesis or package files
  3. enabling all roles at once before validation-only checks
  4. continuing to sign after replay mismatch
  5. assuming peer majority means local node is correct
  6. mixing local debug config into launch production config
  7. treating missing monitoring as acceptable for notary role
  8. restarting into full signing mode automatically after crash
  9. using same unprotected hot key for all roles
  10. improvising launch confirmations without actual artifact verification

42. Formal goals

AZ-025 urmărește aceste obiective:

42.1 Safe operator bootstrap

Validators and operators can start nodes without introducing artifact or configuration ambiguity.

42.2 Launch-role discipline

Consensus roles activate only after explicit verification and readiness checks.

42.3 Evidence-preserving anomaly handling

Early launch anomalies are contained and investigated without destroying useful evidence.

42.4 Rejoin safety

Nodes can restart or rejoin conservatively without silently poisoning consensus with local uncertainty.


43. Formula documentului

Validator/Operator Launch Manual = verify artifacts + verify environment + preflight node + activate roles conservatively + monitor first epochs + preserve evidence on anomaly


44. Relația cu restul suitei

  • AZ-022 definește pachetul genesis concret.
  • AZ-017 definește criteriul de lansare.
  • AZ-025 definește cum execută efectiv operatorii și validatorii acea lansare.

Pe scurt: AZ-017 spune când ai voie să lansezi; AZ-025 spune cum pornești nodurile fără să strici lansarea.


45. Ce urmează

După AZ-025, documentul corect este:

AZ-026 — Genesis Ceremony and Launch Ceremony Protocol

Acolo trebuie fixate:

  • pașii formali de ceremonie,
  • cine confirmă ce,
  • în ce ordine,
  • cum se închid aprobările,
  • și cum se marchează oficial trecerea de la package verification la network start.

Închidere

Un launch manual bun nu spune doar „pornește nodul”. Spune exact: ce verifici, ce nu ai voie să presupui, când ai voie să semnezi, când trebuie să te oprești, și ce dovadă păstrezi dacă ceva nu se potrivește.

Acolo începe operarea disciplinată reală a validatorilor.