Production Patterns

Patterns for running neops workers reliably at scale.

Container Deployment

A minimal Dockerfile for a worker:

FROM python:3.12-slim

WORKDIR /app

COPY pyproject.toml uv.lock ./
RUN pip install uv && uv sync --no-dev

COPY . .

CMD ["uv", "run", "neops_worker"]

Key considerations:

Pin the Python version to match your development environment.
The worker imports configuration from the project's config module — ensure it is available in the container image.
Use .env for configuration or inject environment variables via your orchestrator (Docker Compose, Kubernetes).
Exclude test code via .dockerignore rather than selective COPY.

Docker Compose

services:
  worker:
    build: .
    environment:
      URL_BLACKBOARD: http://workflow-engine:3030
      DIR_FUNCTION_BLOCKS: ./my_function_blocks
      WORKER_NAME: docker-worker-01
    restart: unless-stopped
    depends_on:
      - workflow-engine

Scaling Workers

Horizontal scaling

Run multiple worker instances. Each worker:

Registers independently with the workflow engine.
Polls for jobs it can execute (based on registered function blocks).
Processes one job at a time (max_workers=1).

The workflow engine distributes jobs across available workers. To handle more concurrent work, add more worker instances.

Specialization

Different workers can register different function block packages:

Worker	`DIR_FUNCTION_BLOCKS`	Handles
`config-worker`	`./config_blocks`	Config backup, push, compliance
`inventory-worker`	`./inventory_blocks`	Discovery, inventory collection
`monitoring-worker`	`./monitoring_blocks`	Health checks, SNMP polling

This lets you scale each concern independently and isolate failures.

Health Monitoring

Heartbeat

The worker sends heartbeats every HEARTBEAT_INTERVAL seconds. If the workflow engine stops receiving heartbeats, it marks the worker as expired and re-queues its in-flight job.

Log-based monitoring

The worker logs structured events at key lifecycle points:

Event	Level	Indicates
`NEOPS Worker starting...`	INFO	Worker initializing
`Found N function block(s)`	INFO	Discovery completed
`Registering worker with backend...`	INFO	Registration in progress
`Processing N job(s)`	INFO	Jobs received and processing
`Received SIGTERM. Shutting down...`	INFO	Graceful shutdown initiated
`Shutdown requested, skipping N remaining job(s)`	WARNING	Shutdown during job batch
`Worker registration failed`	ERROR	Cannot reach workflow engine
`Worker expired! Backend rejected ping with 404.`	ERROR	Worker invalidated by backend

Forward these logs to your monitoring stack (ELK, Grafana Loki, Datadog) for alerting.

Error Recovery

Failure	Worker behavior	Engine behavior
Network partition	Misses heartbeats, reconnects on recovery	Marks worker expired, re-queues jobs
Function block exception	Reports failure result to blackboard	Marks job as FAILED_SAFE or FAILED_UNSAFE based on purity
Worker crash	Process exits	Detects missed heartbeats, re-queues
Workflow engine restart	Worker retries API calls	Re-accepts worker registrations

Security Considerations

Credentials: Device credentials live in the CMS. Workers access them through WorkflowContext. Never store credentials in function block code or environment variables.
Network access: Workers need network access to both the workflow engine API and the managed devices. Use network segmentation to limit blast radius.
TLS: Configure URL_BLACKBOARD with https:// in production. The underlying HTTP client respects standard TLS settings.
Least privilege: Each worker only needs access to the devices its function blocks manage. Use separate workers for different network zones.