Skip to content

Execution Lifecycle

Every workflow execution transitions through a well-defined set of states. Understanding these states helps you monitor executions, debug failures, and design robust workflows.

TL;DR

A workflow execution moves through: NEWVALIDREADY → acquire entities → LOCKEDRUNNINGCOMPLETED (or FAILED_SAFE / FAILED_UNSAFE). Green states are terminal successes, red states are terminal failures. A FAILED_SAFE means nothing irreversible happened; FAILED_UNSAFE means manual review is needed.

State Machine

stateDiagram-v2
        classDef goodState stroke:green
        classDef warningState stroke:orange
        classDef badState stroke:red

        class NEW goodState
        class READY goodState
        class VALID goodState
        class LOCKING goodState
        class LOCKED goodState
        class BLOCKED_WAITING goodState
        class RESOURCE_DISCOVERY goodState
        class RESOURCES_DISCOVERED goodState
        class SCHEDULED goodState
        class RUNNING goodState
        class COMPLETED goodState
        class COMPLETED_ACK goodState
        class FAILED_SAFE warningState
        class ERROR warningState
        class ROLLBACK warningState
        class FAILED_UNSAFE badState
        class FAILED_UNSAFE_ACK badState
        class FAILED_SAFE_ACK warningState

    [*] --> NEW
    NEW --> VALID: resolved & validated
    NEW --> FAILED_SAFE: validation failed

    VALID --> READY
    VALID --> BLOCKED_WAITING: awaiting conditions

    READY --> RESOURCE_DISCOVERY
    RESOURCE_DISCOVERY --> RESOURCES_DISCOVERED

    RESOURCES_DISCOVERED --> SCHEDULED
    SCHEDULED --> LOCKING
    LOCKING --> LOCKED

    LOCKED --> RUNNING
    RUNNING --> COMPLETED: all steps done
    RUNNING --> ERROR: step failed

    ERROR --> FAILED_SAFE: pure execution
    ERROR --> FAILED_UNSAFE: side effects
    ERROR --> ROLLBACK: rollback initiated

    ROLLBACK --> FAILED_SAFE: rollback succeeded
    ROLLBACK --> FAILED_UNSAFE: rollback failed

    COMPLETED --> [*]
    FAILED_SAFE --> [*]
    FAILED_UNSAFE --> [*]

States in Detail

Submission Phase

NEW : The execution has been created. The engine resolves function block references, validates the workflow definition against the schema, and checks that all referenced function blocks are registered. - Success → VALID - Failure (unresolvable FB, invalid schema) → FAILED_SAFE

VALID : The workflow is structurally correct and all dependencies are resolved. - Normally → READY - If external conditions required → BLOCKED_WAITING (reserved for future use)

Acquisition Phase

READY : The execution is ready to acquire resources. This is the entry point for the resource discovery phase. - → RESOURCE_DISCOVERY

RESOURCE_DISCOVERY : The engine traverses the workflow tree and creates ACQUIRE jobs for each step that defines entity requirements. Workers execute the function block's acquire() methods and return the entities needed. - → RESOURCES_DISCOVERED

RESOURCES_DISCOVERED : All entity requirements have been collected. The engine knows which devices, interfaces, and groups are needed. - → SCHEDULED

Locking Phase

SCHEDULED : The execution is ready for locking. If the required entities are locked by another execution, the workflow waits here. - → LOCKING

LOCKING : The engine has sent a lock request to the CMS for all required entities. - Success → LOCKED - Failure → FAILED_SAFE

LOCKED : All entities are exclusively locked. No other workflow can modify them until this execution completes. - → RUNNING

Execution Phase

RUNNING : Steps are actively executing. The engine creates EXECUTE jobs on the blackboard, processes results, and advances through the step tree. - All steps complete → COMPLETED - Any step fails → ERROR

Terminal States

COMPLETED : All steps finished successfully. DB updates are applied to the CMS, locks are released.

ERROR : A step failed. The engine classifies the failure based on purity. - All executed steps were pure → FAILED_SAFE - Otherwise → FAILED_UNSAFE

!!! info "Stub handler"
    The ERROR state handler currently performs the pure/unsafe classification and
    transitions directly. The planned rollback path (`ERROR` → `ROLLBACK`) is not yet
    active.

ROLLBACK (planned) : When implemented, the engine will execute rollback jobs in reverse step order. - Rollback succeeds → FAILED_SAFE - Rollback fails → FAILED_UNSAFE

FAILED_SAFE : The execution failed with no irreversible side effects. Safe to retry or discard.

FAILED_UNSAFE : The execution failed with potential side effects. Manual review required.

Monitoring Executions

Via API

# List active executions
curl http://localhost:3030/workflow-execution/active

# Get a specific execution
curl http://localhost:3030/workflow-execution/id/<execution-id>

# List executions by state
curl http://localhost:3030/workflow-execution/state/RUNNING

Via Monitor App

The Monitor App's Executions view shows all executions with their current state, progress (jobs completed/total), and drill-down to individual job results.

Stuck Execution Recovery

If an execution appears stuck:

Symptom Likely cause Resolution
Stuck in SCHEDULED Entities locked by another execution Wait or abort the blocking execution
Stuck in RUNNING Worker crashed or disconnected Wait for stuck job timeout (12 min default), then jobs are auto-failed
Stuck in LOCKING CMS unreachable Check CMS connectivity, restart engine

Executions can be aborted:

curl -X DELETE http://localhost:3030/workflow-execution/<execution-id>

Limited implementation

The abort endpoint exists but has limited functionality. It may not cleanly stop in-flight jobs or release locks in all scenarios. Use with caution in production.