The Blackboard

The blackboard is the shared job queue that decouples the workflow engine from the workers. The engine writes jobs; workers read them, execute them, and write results back. This pattern allows workers to scale independently, fail without taking down the engine, and be deployed across different machines or containers.

How It Works

sequenceDiagram
    participant Engine as Workflow Engine
    participant BB as Blackboard
    participant W1 as Worker 1 (echo, ping)
    participant W2 as Worker 2 (configBackup)

    Engine->>BB: Create jobs
    par Workers poll
        W1->>BB: Poll for matching jobs
        W2->>BB: Poll for matching jobs
    end
    BB-->>W1: Job assignment
    BB-->>W2: Job assignment
    W1->>W1: Execute function block
    W2->>W2: Execute function block
    W1->>BB: Push result
    W2->>BB: Push result
    BB-->>Engine: Result events

The blackboard is not a separate service -- it is a set of database tables and REST endpoints within the workflow engine. Workers interact with it over HTTP.

Job Lifecycle

Every job transitions through three states:

stateDiagram-v2
    [*] --> PENDING: Engine creates job
    PENDING --> POLLED: Worker claims job
    POLLED --> PUSHED: Worker submits result
    PUSHED --> [*]

PENDING : The engine created the job but no worker has claimed it. The job sits in the queue until a worker with the matching function block polls for it.

POLLED : A worker claimed the job. The job is now assigned to that worker and will not be returned to other workers. If the worker fails to push a result within the timeout (default: 12 minutes), the job is marked as failed.

PUSHED : The worker submitted a result (success or failure). The engine processes the result and advances the workflow.

Job Types

Type	When it's created	What the worker does
`ACQUIRE`	During resource discovery phase	Runs the function block's `acquire()` method to determine which additional entities are needed
`EXECUTE`	During the running phase	Runs the function block's `run()` method with the full entity context
`ROLLBACK`	During error recovery	Runs the function block's `rollback()` method to undo changes

Polling Mechanics

Workers poll the blackboard periodically (the Worker SDK defaults to every 10 seconds, but this is configured on the worker side). Each poll request includes:

workerId -- The worker's unique ID
functionBlockIds -- List of function block IDs the worker can execute

The blackboard returns pending jobs that match the worker's function blocks (default: 1 job per poll, configurable via limit). The assignment is atomic -- once a job is returned to a worker, it is marked as POLLED and will not be returned to other workers.

sequenceDiagram
    participant W as Worker
    participant BB as Blackboard

    loop Every 10 seconds
        W->>BB: POST /blackboard/job<br/>{workerId, functionBlockIds}
        alt Jobs available
            BB->>W: [Job 1, Job 2, ...]
            W->>W: Execute function blocks
            W->>BB: POST /blackboard/job/result<br/>{jobId, status, result}
        else No matching jobs
            BB->>W: []
        end
    end

Result Structure

When a worker pushes a result, it includes:

status -- Success or failure
result -- The function block's return data (conforms to the resultDataJsonSchema)
acquires (for ACQUIRE jobs) -- Additional entity acquisition requests
dbUpdates (for EXECUTE jobs) -- Entity create/patch/delete operations to apply to the CMS

The engine processes results asynchronously through an event-driven system. Each result triggers the next step in the workflow execution.

Scaling

The blackboard pattern enables horizontal scaling:

Multiple workers can poll simultaneously. Each gets different jobs.
Specialized workers can register only specific function blocks (e.g., a worker with Cisco credentials only handles IOS-related FBs).
Worker failure is isolated. If a worker crashes, its jobs time out and can be reassigned.
Engine restart preserves state. All jobs and execution state are in PostgreSQL.

Why not a message queue?

The blackboard uses PostgreSQL instead of a dedicated message queue (RabbitMQ, Redis) for simplicity and transactional consistency. Job state, workflow state, and function block registrations are all in the same database, enabling atomic operations across them. For most network automation workloads, PostgreSQL's performance is more than sufficient.

See also:

Worker SDK Deployment -- Configuring URL_BLACKBOARD and polling intervals for workers
Worker Management -- How the engine monitors worker health and handles failures