Skip to content

Invariants

Eight rules a PR cannot violate. Each is short. Re-read the relevant one before any non-trivial change to server.py, lab_manager.py, or connector.py.

A change cannot break any of these without coordinating with every consumer of the surface area. Each entry: what the rule is, why it exists, and what breaks if it goes away.

The mechanics that enforce these rules — _run_blocking, the LabManager singleton, atexit teardown, the test-stubbing pattern — live in their own pages under Internals. Before touching server.py, lab_manager.py, or netlab/connector.py, read both this page and the relevant Internals page.


One server instance per host

The entrypoint takes a non-blocking FileLock at a fixed path under the system temp directory. A second instance attempting to start on the same host fails the lock, logs the running owner’s pid/user/host/bind/cmd, and exits with status 1.

Why Two servers would race on the single Netlab default instance and corrupt lab state mid-test.
What breaks Concurrent netlab up/netlab down from two processes; orphaned containers; queue corruption.

Stale-lock recovery procedure: Administration → Stale-lock recovery.


One lab per host

Netlab itself only manages one topology per host. LabManager enforces this twice — a class-level singleton in-process, plus a system-wide FileLock for cross-process serialization.

Why The two layers exist for different threats: the singleton stops two async tasks in the server from racing; the file lock stops a separate Python process (e.g. a developer running local netlab by hand alongside the server) from trampling shared state.
What breaks Two callers’ lab state collide; one tears down the other’s containers mid-test.

Mechanics: Internals: LabManager singleton & locking.


Topology identity is the SHA-256 of file content

Not the filename. Two files with different names but identical bytes are the same topology — reuse=True on the second upload attaches to the running lab. Edit one byte and it is a new topology; reuse=True will refuse and either teardown-then-restart (if ref == 0) or return 423.

Why Most systems key on filename. This one doesn’t, so a test helper that copies a vendored topology into each test’s workdir still gets reuse for free.
What breaks Code that depends on filename equality is wrong; reuse stops working across copy-paste topologies and CI suddenly pays Netlab boot cost on every test.

.yml and .yaml are both accepted

Both extensions, case-insensitive, pass prepare_workdir’s suffix check (.yml, .yaml, .YML, .YAML). The HTTP layer accepts the same set, so the surface is uniform end-to-end.

Why There is no asymmetry between what POST /lab validates and what LabManager accepts. Consumers shouldn’t have to remember which extension this project happens to prefer.
What breaks If you add a new entry point that copies a topology and forgets to mirror this check, callers see “topology accepted, then mysteriously rejected” failures with no clear cause.

If you add a new entry point that copies a topology, mirror this check (suffix in (".yml", ".yaml") after .lower()). Anything else — .json, .txt, no extension — should raise loudly.


X-Session-ID is the only access boundary on /lab/*

There is no Bearer token, no mTLS, no tenant header. The /lab/* endpoints gate on a header lookup that confirms the session exists and is ACTIVE. Non-active sessions get 423 Locked; unknown sessions get 404.

/session/heartbeat is gated more loosely: it only requires the session to exist (so a WAITING session can keep its queue slot alive). This asymmetry is deliberate — see Session Queue → access boundary.

Why The service is internal-trust. Any deployment without a VPN enclosure exposes a lab host to the open internet.
What breaks Adding any other auth path (Bearer, mTLS, header magic) without removing this one creates a confused threat model: callers get to choose which boundary to bypass.

Full posture: Security model.


*Dto suffix on Pydantic request/response models

Every request and response model in neops_remote_lab.models.* ends in Dto: SessionInfoDto, CreateSessionResponseDto, LabStatusDto, AcquireResponseDto, DeviceInfoDto. Code review will reject a PR that introduces a model without the suffix.

Why The convention lets a reader scan a file and immediately distinguish wire-format models from internal types.
What breaks Mixing them is a recipe for accidentally serializing internal state to the HTTP surface.

CVE-pinned dependencies

Several entries in pyproject.toml carry # CVE-* comments:

"starlette>=0.49.1",   # CVE-2025-62727 fix
"filelock>=3.20.1,<4", # CVE-2025-68146 fix
"pytest>=9.0.3,<10",   # CVE-2025-71176 fix

When upgrading, preserve the comment and pick a version that still includes the patch. Then re-run make audit (pip-audit --strict) to confirm.

Why The convention is enforced by code review and by make audit in CI. It is not enforced by tooling alone — comments can be deleted accidentally — so treat them as load-bearing.
What breaks Deleting a # CVE-* comment loses the silent metadata that justifies the pin; the next upgrade may regress a security patch.

One remote_lab_fixture per test (collection-time)

A test that depends on more than one fixture created by remote_lab_fixture fails at pytest collection with ValueError. The plugin walks fixture metadata at collection time, so the failure is immediate.

Why A runtime failure (the second acquire would loop in the 423 polling path forever, because the first acquire’s session still holds the host) is much harder to diagnose than a clear ValueError during collection.
What breaks Removing the collection-time check turns a clear error into a silent, several-minute hang.

If you need to exercise two topologies in the same test process, use reuse_lab=True on one and split into two tests. The plugin reorders by fixture rank to keep tests against the same lab contiguous; see Pytest Fixtures → Test execution ordering.


See also