Execution Lifecycle
Every workflow execution transitions through a well-defined set of states. Understanding these states helps you monitor executions, debug failures, and design robust workflows.
TL;DR
A workflow execution moves through: NEW → VALID → READY → acquire entities →
LOCKED → RUNNING → COMPLETED (or FAILED_SAFE / FAILED_UNSAFE).
Green states are terminal successes, red states are terminal failures. A FAILED_SAFE
means nothing irreversible happened; FAILED_UNSAFE means manual review is needed.
State Machine
stateDiagram-v2
classDef goodState stroke:green
classDef warningState stroke:orange
classDef badState stroke:red
class NEW goodState
class READY goodState
class VALID goodState
class LOCKING goodState
class LOCKED goodState
class BLOCKED_WAITING goodState
class RESOURCE_DISCOVERY goodState
class RESOURCES_DISCOVERED goodState
class SCHEDULED goodState
class RUNNING goodState
class COMPLETED goodState
class COMPLETED_ACK goodState
class FAILED_SAFE warningState
class ERROR warningState
class ROLLBACK warningState
class FAILED_UNSAFE badState
class FAILED_UNSAFE_ACK badState
class FAILED_SAFE_ACK warningState
[*] --> NEW
NEW --> VALID: resolved & validated
NEW --> FAILED_SAFE: validation failed
VALID --> READY
VALID --> BLOCKED_WAITING: awaiting conditions
READY --> RESOURCE_DISCOVERY
RESOURCE_DISCOVERY --> RESOURCES_DISCOVERED
RESOURCES_DISCOVERED --> SCHEDULED
SCHEDULED --> LOCKING
LOCKING --> LOCKED
LOCKED --> RUNNING
RUNNING --> COMPLETED: all steps done
RUNNING --> ERROR: step failed
ERROR --> FAILED_SAFE: pure execution
ERROR --> FAILED_UNSAFE: side effects
ERROR --> ROLLBACK: rollback initiated
ROLLBACK --> FAILED_SAFE: rollback succeeded
ROLLBACK --> FAILED_UNSAFE: rollback failed
COMPLETED --> [*]
FAILED_SAFE --> [*]
FAILED_UNSAFE --> [*]
States in Detail
Submission Phase
NEW
: The execution has been created. The engine resolves function block references, validates the workflow definition against the schema, and checks that all referenced function blocks are registered.
- Success → VALID
- Failure (unresolvable FB, invalid schema) → FAILED_SAFE
VALID
: The workflow is structurally correct and all dependencies are resolved.
- Normally → READY
- If external conditions required → BLOCKED_WAITING (reserved for future use)
Acquisition Phase
READY
: The execution is ready to acquire resources. This is the entry point for the resource discovery phase.
- → RESOURCE_DISCOVERY
RESOURCE_DISCOVERY
: The engine traverses the workflow tree and creates ACQUIRE jobs for each step that defines entity requirements. Workers execute the function block's acquire() methods and return the entities needed.
- → RESOURCES_DISCOVERED
RESOURCES_DISCOVERED
: All entity requirements have been collected. The engine knows which devices, interfaces, and groups are needed.
- → SCHEDULED
Locking Phase
SCHEDULED
: The execution is ready for locking. If the required entities are locked by another execution, the workflow waits here.
- → LOCKING
LOCKING
: The engine has sent a lock request to the CMS for all required entities.
- Success → LOCKED
- Failure → FAILED_SAFE
LOCKED
: All entities are exclusively locked. No other workflow can modify them until this execution completes.
- → RUNNING
Execution Phase
RUNNING
: Steps are actively executing. The engine creates EXECUTE jobs on the blackboard, processes results, and advances through the step tree.
- All steps complete → COMPLETED
- Any step fails → ERROR
Terminal States
COMPLETED : All steps finished successfully. DB updates are applied to the CMS, locks are released.
ERROR
: A step failed. The engine classifies the failure based on purity.
- All executed steps were pure → FAILED_SAFE
- Otherwise → FAILED_UNSAFE
!!! info "Stub handler"
The ERROR state handler currently performs the pure/unsafe classification and
transitions directly. The planned rollback path (`ERROR` → `ROLLBACK`) is not yet
active.
ROLLBACK (planned)
: When implemented, the engine will execute rollback jobs in reverse step order.
- Rollback succeeds → FAILED_SAFE
- Rollback fails → FAILED_UNSAFE
FAILED_SAFE : The execution failed with no irreversible side effects. Safe to retry or discard.
FAILED_UNSAFE : The execution failed with potential side effects. Manual review required.
Monitoring Executions
Via API
# List active executions
curl http://localhost:3030/workflow-execution/active
# Get a specific execution
curl http://localhost:3030/workflow-execution/id/<execution-id>
# List executions by state
curl http://localhost:3030/workflow-execution/state/RUNNING
Via Monitor App
The Monitor App's Executions view shows all executions with their current state, progress (jobs completed/total), and drill-down to individual job results.
Stuck Execution Recovery
If an execution appears stuck:
| Symptom | Likely cause | Resolution |
|---|---|---|
Stuck in SCHEDULED |
Entities locked by another execution | Wait or abort the blocking execution |
Stuck in RUNNING |
Worker crashed or disconnected | Wait for stuck job timeout (12 min default), then jobs are auto-failed |
Stuck in LOCKING |
CMS unreachable | Check CMS connectivity, restart engine |
Executions can be aborted:
Limited implementation
The abort endpoint exists but has limited functionality. It may not cleanly stop in-flight jobs or release locks in all scenarios. Use with caution in production.