The Blackboard
The blackboard is the shared job queue that decouples the workflow engine from the workers. The engine writes jobs; workers read them, execute them, and write results back. This pattern allows workers to scale independently, fail without taking down the engine, and be deployed across different machines or containers.
How It Works
sequenceDiagram
participant Engine as Workflow Engine
participant BB as Blackboard
participant W1 as Worker 1 (echo, ping)
participant W2 as Worker 2 (configBackup)
Engine->>BB: Create jobs
par Workers poll
W1->>BB: Poll for matching jobs
W2->>BB: Poll for matching jobs
end
BB-->>W1: Job assignment
BB-->>W2: Job assignment
W1->>W1: Execute function block
W2->>W2: Execute function block
W1->>BB: Push result
W2->>BB: Push result
BB-->>Engine: Result events
The blackboard is not a separate service -- it is a set of database tables and REST endpoints within the workflow engine. Workers interact with it over HTTP.
Job Lifecycle
Every job transitions through three states:
stateDiagram-v2
[*] --> PENDING: Engine creates job
PENDING --> POLLED: Worker claims job
POLLED --> PUSHED: Worker submits result
PUSHED --> [*]
PENDING : The engine created the job but no worker has claimed it. The job sits in the queue until a worker with the matching function block polls for it.
POLLED : A worker claimed the job. The job is now assigned to that worker and will not be returned to other workers. If the worker fails to push a result within the timeout (default: 12 minutes), the job is marked as failed.
PUSHED : The worker submitted a result (success or failure). The engine processes the result and advances the workflow.
Job Types
| Type | When it's created | What the worker does |
|---|---|---|
ACQUIRE |
During resource discovery phase | Runs the function block's acquire() method to determine which additional entities are needed |
EXECUTE |
During the running phase | Runs the function block's run() method with the full entity context |
ROLLBACK |
During error recovery | Runs the function block's rollback() method to undo changes |
Polling Mechanics
Workers poll the blackboard periodically (the Worker SDK defaults to every 10 seconds, but this is configured on the worker side). Each poll request includes:
workerId-- The worker's unique IDfunctionBlockIds-- List of function block IDs the worker can execute
The blackboard returns pending jobs that match the worker's function blocks (default: 1 job per poll, configurable via limit). The assignment is atomic -- once a job is returned to a worker, it is marked as POLLED and will not be returned to other workers.
sequenceDiagram
participant W as Worker
participant BB as Blackboard
loop Every 10 seconds
W->>BB: POST /blackboard/job<br/>{workerId, functionBlockIds}
alt Jobs available
BB->>W: [Job 1, Job 2, ...]
W->>W: Execute function blocks
W->>BB: POST /blackboard/job/result<br/>{jobId, status, result}
else No matching jobs
BB->>W: []
end
end
Result Structure
When a worker pushes a result, it includes:
status-- Success or failureresult-- The function block's return data (conforms to theresultDataJsonSchema)acquires(for ACQUIRE jobs) -- Additional entity acquisition requestsdbUpdates(for EXECUTE jobs) -- Entity create/patch/delete operations to apply to the CMS
The engine processes results asynchronously through an event-driven system. Each result triggers the next step in the workflow execution.
Scaling
The blackboard pattern enables horizontal scaling:
- Multiple workers can poll simultaneously. Each gets different jobs.
- Specialized workers can register only specific function blocks (e.g., a worker with Cisco credentials only handles IOS-related FBs).
- Worker failure is isolated. If a worker crashes, its jobs time out and can be reassigned.
- Engine restart preserves state. All jobs and execution state are in PostgreSQL.
Why not a message queue?
The blackboard uses PostgreSQL instead of a dedicated message queue (RabbitMQ, Redis) for simplicity and transactional consistency. Job state, workflow state, and function block registrations are all in the same database, enabling atomic operations across them. For most network automation workloads, PostgreSQL's performance is more than sufficient.
See also:
- Worker SDK Deployment -- Configuring
URL_BLACKBOARDand polling intervals for workers - Worker Management -- How the engine monitors worker health and handles failures